Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexer failures should produce more information for root cause analysis #19615

Open
mikkolehtisalo opened this issue Jun 12, 2024 · 3 comments

Comments

@mikkolehtisalo
Copy link
Contributor

mikkolehtisalo commented Jun 12, 2024

What?

Indexer failure messages in the UI look something like this:

2 hours ago techlog_52 c5f5e982-287f-11ef-954a-00505687ab33 OpenSearchException[OpenSearch exception [type=mapper_parsing_exception, reason=failed to parse field [level] of type [long] in document with id 'c5f5e982-287f-11ef-954a-00505687ab33'. Preview of field's value: 'Information']]; nested: OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=For input string: "Information"]];

This is not really helpful for resolving the issue. If you have large amount of servers, systems, and components, the issue could be in numerous components generating logs, different responsible teams and so on. It is impossible to start diagnostics when you don't even know whom to start it with.

It seems OpenSearch doesn't log the issue from the example message I provided at all. It would apparently require debug logging level to appear, and that is simply not doable when you receive huge volume of logs. Graylog should be the component that produces extra information.

Alternatives:

  • Add sender's IP to the indexer failure messages and UI (probably enough, somewhat easy to implement)
  • Add logging of the message to the server logs (probably easy, and also enough for system admins)
  • Revive the dead letter implementation (complex, most convenient for system admins)

See MessagesAdapterOS2 for clues. Offending message at least should be available in most cases.

Why?

The current indexer failures view doesn't provide basic required information for resolving the issues. It is not possible to resolve indexer failures in more complex environments.

Your Environment

n/a

@tellistone
Copy link

tellistone commented Jun 13, 2024

Hi mikkolehtisalo

I think the info you seek is already available via the "Processing and Indexing Failures" Index

If I navigate to System > Overview, to the Indexing error section and hit "show errors"

image

And look at the failed messages - I can see the cause, the source, the associated stream (and thus index) etc. The only info missing is the associated input:

image

This is enabled via System > Configuration, here:

image

Does this provide the info you need?

@mikkolehtisalo
Copy link
Contributor Author

Failure processing plugin doesn't seem to exist on my system.

image

@tellistone
Copy link

tellistone commented Jun 20, 2024

May I ask which version number of Graylog?
Open or Enterprise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants