Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why JSON logs cannot be directly parsed? #20649

Closed
trustnote-wang opened this issue Jun 12, 2024 · 7 comments
Closed

Why JSON logs cannot be directly parsed? #20649

trustnote-wang opened this issue Jun 12, 2024 · 7 comments
Labels
type: bug A code related bug.

Comments

@trustnote-wang
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

log:
{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"DEBUG", "SeverityNumber": 5, "Name":"log","Body":"key:answer_d8aeae3a-78c3-484a-a2a4-4607c76cb526 finish","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"chat.go", "file.line":"253"}}
{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"INFO", "SeverityNumber": 9, "Name":"log","Body":"completion success, time_cost:28644","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"sse.go", "file.line":"58"}}

vector:
[transforms.parse_chat-service_logs]
type = "remap"
inputs = [ "chat-service_logs" ]
source = '''
. = parse_json!(.message)
del(.message)

'''

output:

{"file":"/var/log/qcms-service/qcms-service_2024-06-12-03.log","host":"gpt01v.dev.bjat.qianxin-inc.cn","message":"{"Timestamp":1718162535,"TraceId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","SpanId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","SeverityText":"INFO", "SeverityNumber": 9, "Name":"log","Body":"get message status, req:user.CompletionsStatus_Request{MessageId:"d6a0cefa-c358-426a-97df-ae866dec27bb", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"message.go", "file.line":"55"}}","source_type":"file","timestamp":"2024-06-12T03:22:15.666538843Z"}
{"Attributes":{"file.line":"204","file.name":"message.go"},"Body":"workflow id:answer-workflow-id2-e290e878-9f25-4cb1-a85b-e8d66d7ea8ea","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":9,"SeverityText":"INFO","SpanId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","Timestamp":1718162535,"TraceId":"f1b37b6d-fa07-432f-a114-8116e905f9c4"}

Configuration

No response

Version

0.38

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@trustnote-wang trustnote-wang added the type: bug A code related bug. label Jun 12, 2024
@trustnote-wang
Copy link
Author

The source log has not been filtered

@jszwedko
Copy link
Member

Hi @trustnote-wang !

Apologies, I'm not seeing the issue here. The input is JSON and would be parsed correctly with the remap transform you have. What issue are you running into?

@trustnote-wang
Copy link
Author

Hi @trustnote-wang !

Apologies, I'm not seeing the issue here. The input is JSON and would be parsed correctly with the remap transform you have. What issue are you running into?

Hi @jszwedko

My question is: I don't think the source logs should still exist in the output of the vector, for example:

{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"INFO", "SeverityNumber": 9, "Name":"log","Body":"completion success, time_cost:28644","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"sse.go", "file.line":"58"}}

It should be parsed into:

{"Attributes":{"file.line":"204","file.name":"message.go"},"Body":"workflow id:answer-workflow-id2-e290e878-9f25-4cb1-a85b-e8d66d7ea8ea","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":9,"SeverityText":"INFO","SpanId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","Timestamp":1718162535,"TraceId":"f1b37b6d-fa07-432f-a114-8116e905f9c4"}

Instead, it should not be parsed as:

{"file":"/var/log/qcms-service/qcms-service_2024-06-12-03.log","host":"gpt01v.dev.bjat.qianxin-inc.cn","message":"{"Timestamp":1718162535,"TraceId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","SpanId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","SeverityText":"INFO", "SeverityNumber": 9, "Name":"log","Body":"get message status, req:user.CompletionsStatus_Request{MessageId:"d6a0cefa-c358-426a-97df-ae866dec27bb", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"message.go", "file.line":"55"}}","source_type":"file","timestamp":"2024-06-12T03:22:15.666538843Z"}

{"Attributes":{"file.line":"204","file.name":"message.go"},"Body":"workflow id:answer-workflow-id2-e290e878-9f25-4cb1-a85b-e8d66d7ea8ea","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":9,"SeverityText":"INFO","SpanId":"f1b37b6d-fa07-432f-a114-8116e905f9c4","Timestamp":1718162535,"TraceId":"f1b37b6d-fa07-432f-a114-8116e905f9c4"}

It looks like the vector parsed this json log, but it also outputs the source log together. This is not what we need. I don't know why this happened. Maybe it's because the json format of the source log is incorrect? I don't know

@jszwedko
Copy link
Member

Oh I see. Yes, sources do add metadata (e.g. the source_type, file, host, fields) but the remap transform you have there should drop them because it is just setting . to the result of parsing the message as JSON. Are you sure you have the inputs to each component configured correctly? Can you share the full configuration?

Relatedly there is a beta feature you might be interested in that avoids sources adding metadata to the event itself. See: https://vector.dev/blog/log-namespacing/

@trustnote-wang
Copy link
Author

Oh I see. Yes, sources do add metadata (e.g. the source_type, file, host, fields) but the remap transform you have there should drop them because it is just setting . to the result of parsing the message as JSON. Are you sure you have the inputs to each component configured correctly? Can you share the full configuration?

Relatedly there is a beta feature you might be interested in that avoids sources adding metadata to the event itself. See: https://vector.dev/blog/log-namespacing/

ok, this is the full configuration:

[sources.chat-service_logs]
type = "file"
ignore_older_secs = 600
include = ["/var/log/qcms-service/qcms-service_*.log"]
read_from = "end"

[transforms.parse_chat-service_logs]
type = "remap"
inputs = [ "chat-service_logs" ]
source = '''
. = parse_json!(.message)
del(.message)
'''

[sinks.print]
type = "console"
inputs = ["parse_chat-service_logs"]
encoding.codec = "json"

This configuration should be expected to work without any issues. I have not encountered any problems when processing other json logs, so my current question is: Is there an error in the json format of these logs, which is not in standard json format, and therefore vector cannot process them?

Thank you for showing me this. I need to learn it. https://vector.dev/blog/log-namespacing/

@jszwedko
Copy link
Member

I used your config and fed in:

{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"DEBUG", "SeverityNumber": 5, "Name":"log","Body":"key:answer_d8aeae3a-78c3-484a-a2a4-4607c76cb526 finish","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"chat.go", "file.line":"253"}}
{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"INFO", "SeverityNumber": 9, "Name":"log","Body":"completion success, time_cost:28644","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"sse.go", "file.line":"58"}}

as the input. I got:

{"Attributes":{"file.line":"253","file.name":"chat.go"},"Body":"key:answer_d8aeae3a-78c3-484a-a2a4-4607c76cb526 finish","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":5,"SeverityText":"DEBUG","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4"}
{"Attributes":{"file.line":"58","file.name":"sse.go"},"Body":"completion success, time_cost:28644","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":9,"SeverityText":"INFO","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4"}

as the output. This seems to be what you expect?

@trustnote-wang
Copy link
Author

I used your config and fed in:

{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"DEBUG", "SeverityNumber": 5, "Name":"log","Body":"key:answer_d8aeae3a-78c3-484a-a2a4-4607c76cb526 finish","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"chat.go", "file.line":"253"}}
{"Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","SeverityText":"INFO", "SeverityNumber": 9, "Name":"log","Body":"completion success, time_cost:28644","Resource":{"service.name":"qcms-service","service.ver":"v1.0"}, "Attributes": {"file.name":"sse.go", "file.line":"58"}}

as the input. I got:

{"Attributes":{"file.line":"253","file.name":"chat.go"},"Body":"key:answer_d8aeae3a-78c3-484a-a2a4-4607c76cb526 finish","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":5,"SeverityText":"DEBUG","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4"}
{"Attributes":{"file.line":"58","file.name":"sse.go"},"Body":"completion success, time_cost:28644","Name":"log","Resource":{"service.name":"qcms-service","service.ver":"v1.0"},"SeverityNumber":9,"SeverityText":"INFO","SpanId":"857d735c-c094-4132-ad3b-984b937692a4","Timestamp":1718160882,"TraceId":"857d735c-c094-4132-ad3b-984b937692a4"}

as the output. This seems to be what you expect?

hi @jszwedko

I think I have found the problem. The problem is that the format of some source JSON logs does have problems, not in the standard JSON format. So there is no problem with the vector. Thank you for your patient help. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants