Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative numbers for gl2_processing_duration_ms #19696

Open
damianharouff opened this issue Jun 20, 2024 · 10 comments
Open

Negative numbers for gl2_processing_duration_ms #19696

damianharouff opened this issue Jun 20, 2024 · 10 comments

Comments

@damianharouff
Copy link

While reviewing gl2_processing_duration_ms, it was noted that some messages have a negative processing time associated with them:

image

image

Doesn't seem to have any specific case or reason, also happens across multiple sources, some processed by Illuminate, some not:

image

Happy to give Graylogger access to my cluster for further troubleshooting if not able to replicate in a dev environment.

Your Environment

  • Graylog Version: Graylog 6.0.3+eb761c5
  • Java Version: Embedded
  • OpenSearch Version: 2.14
  • MongoDB Version: 7.0
  • Operating System: Debian 12
  • Browser version: Arc, Chromium Engine Version 126.0.6478.57
@damianharouff
Copy link
Author

damianharouff commented Jun 20, 2024

It also seems to be only "large" negative numbers, e.g. it's not a few ms negative, it's minimum ~ -10000 ms:

image

@drewmiranda-gl
Copy link
Member

I just checked my environment and i also have negative values.
image

Probably unrelated, i see that gl2_processing_duration_ms is showing 2 different types:
image

@bernd
Copy link
Member

bernd commented Jun 26, 2024

@damianharouff @drewmiranda-gl Can you post the full message (ideally from OpenSearch) for an affected message so we can see all the timestamp and timing-related fields?

@bernd
Copy link
Member

bernd commented Jun 26, 2024

@damianharouff Thank you! 🙏 That helped!

Looking at the first message. 👀
The problem is that the receive timestamp is after the processing timestamp. So we need to figure out how that happens.

"gl2_receive_timestamp":    "2024-06-26 13:25:30.462",
"gl2_processing_timestamp": "2024-06-26 13:25:18.192",
"gl2_processing_duration_ms": -12270,

@drewmiranda-gl
Copy link
Member

For what its worth, in my environment this only happened on a specific day and never before and never since.

image

Sanitized messages:

{
  "gl2_accounted_message_size": 405,
  "gl2_receive_timestamp": "2024-06-17 01:36:00.129",
  "gl2_remote_ip": "192.168.0.1",
  "gl2_remote_port": 55834,
  "streams": [
    "63068dc98a735a37e8d535f9"
  ],
  "gl2_message_id": "01J0HXCE41012S9S3M40133D5X",
  "gl2_source_input": "6328e4593521fe2ae2ee110c",
  "gl2_processing_timestamp": "2024-06-17 01:36:00.128",
  "gl2_source_node": "7a72f5ea-ccb3-4b6f-858b-bab2d1aaf7b5",
  "gl2_forwarder_input": "62bb56a525abac169687b4ab",
  "gl2_processing_duration_ms": -1,
  "timestamp": "2024-06-17T01:36:00.129Z"
}
{
  "gl2_accounted_message_size": 819,
  "gl2_receive_timestamp": "2024-06-15 17:50:53.277",
  "gl2_remote_ip": "192.168.0.108",
  "gl2_remote_port": 45175,
  "streams": [
    "000000000000000000000001"
  ],
  "gl2_message_id": "01J0EGC28V000GFMPF9M1X5KG0",
  "gl2_source_input": "63112596964e770359618042",
  "gl2_processing_timestamp": "2024-06-15 17:50:53.246",
  "gl2_source_node": "7a72f5ea-ccb3-4b6f-858b-bab2d1aaf7b5",
  "gl2_forwarder_input": "62bb56a525abac169687b4ab",
  "gl2_processing_duration_ms": -31,
  "timestamp": "2024-06-15T17:50:53.211Z"
}

If it matters, all of my ingest comes from forwarders.

@damianharouff
Copy link
Author

I was wondering about that, and thought it was maybe due to incorrect time on one of the nodes, but GL and OSearch nodes are all showing a plausible time:

damian@Damians-MBP ansible % ansible -i hosts graylog -m shell -a "date"
192.168.69.201 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.211 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.210 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.200 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.202 | CHANGED | rc=0 >>
Wed Jun 26 02:14:16 PM UTC 2024
192.168.69.213 | CHANGED | rc=0 >>
Wed Jun 26 02:14:18 PM UTC 2024
192.168.69.212 | CHANGED | rc=0 >>
Wed Jun 26 02:14:18 PM UTC 2024
damian@Damians-MBP ansible %

Slight seconds skew is due to ansible running serially.

@bernd
Copy link
Member

bernd commented Jun 26, 2024

If it matters, all of my ingest comes from forwarders.

Yeah, I think it does. Damian's messages are also ingested via the Forwarder.

Thanks, folks! I think we provided enough context and info so the team can start debugging.

@drewmiranda-gl
Copy link
Member

Posting some time skew info since is suspect this is contributing if not the cause

Negative processing time as seen in graylog:
image

Time skew for graylog server:
image

Time skew for applicable forwarder:
image

@damianharouff
Copy link
Author

The statement about Forwarders prompted me to look at my forwarder box, and....

image

...turns out my forwarder box was 12 seconds in the future. I installed and activated ntp on it, and no more messages with a negative processing duration timestamp. It does make me wonder why only some of my messages were negative and not all, since all of my ingest is through this one forwarder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants