-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogStash fails to acquire lock causing LockException. #16173
Comments
Thanks for filing the issue. Could you please provide the information requested on the issue template? That is:
Thank you! |
Hi @roaksoax, Thank you for acknowledging the issue. Please find below the requested information. Logstash Version: Installation Source with Steps followed to install: Pipeline sample causing the issue:
Steps to reproduce the issue: Logs (providing the traceback is not enough). |
The backtrace from the issue description appears to be the jruby runtime hitting an NPE in the course of interpreting and running our code (as opposed to our code failing to acquire a lock). This is not a normal scenario, and is certainly caused by a bug in Jruby. The NPE appears to have occurred while starting the pipeline, after the Logstash code had acquired the PQ's lock (which is two-fold; when opening the queue, we first ensure that no other process has the queue open using an on-disk lock file, and then ensure that the current process only opens it once; it appears that both levels of locks had been acquired prior to jruby throwing the NPE). The Agent's config state converger prevented the exception from crashing the Logstash process, but the queue's lock was left in a locked state. Because the lock is supposed to live beyond the starting of the pipeline, there is no implicit auto-close handling. Since the lock had been acquired and was not released, subsequent reloads of the pipeline cannot acquire it. The only way to get the pipeline running again is to stop the process, manually remove the offending queue's lock file, and restart the process. Logstash 8.11.3's distribution from Elastic is bundled with Jruby 9.4.5.0 and Adoptium's JDK 17.0.9p9, but since you have built from source there are a number of additional variables at play. Since you built from source it would also be helpful to know:
|
Hi @yaauie , Thank you for sharing the detailed analysis. Please find the below info, which you have requested.
|
Hi @yaauie, Just to add, we observe LockException multiple times and the pattern we observe is always it is followed by some error. Some of them like
Proposal: Can we handle releasing of on-disk lock in this part of code https://github.com/elastic/logstash/blob/main/logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java#L175-L177 by invoking method releaseLockAndSwallow(); |
Hi, |
We observe LockException when logstash process is running. Looking at the logs, before LockException has occurred logstash.agent is trying to fetch the pipelines count but it couldn't get casuing JavanNullPointerException.
[logstash.agent] Failed to execute action {:action=>LogStash::PipelineAction::Reload/pipeline_id:logstash, :exception=>'Java::JavaLang::NullPointerException', :message=>'', :backtrace=>['org.jruby.runtime.scopes.DynamicScope5.getValue(Unknown Source)', 'org.jruby.ir.interpreter.InterpreterEngine.retrieveOp(InterpreterEngine.java:594)', 'org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:348)', 'org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)', 'org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:116)', 'org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:136)', 'org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:76)', 'org.jruby.runtime.Block.yieldSpecific(Block.java:158)', 'org.jruby.ext.monitor.Monitor.synchronize(Monitor.java:82)', 'org.jruby.ext.monitor.Monitor$INVOKER$i$0$0$synchronize.call(Monitor$INVOKER$i$0$0$synchronize.gen)', 'org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:561)', 'org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:90)', 'org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:103)', 'org.jruby.ir.instructions.CallBase.interpret(CallBase.java:545)', 'org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:367)', 'org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)', 'org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:82)', 'org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:201)', 'org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:188)', 'org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:220)', 'org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:242)', 'org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:318)', 'org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)', 'org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:82)', 'org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:201)', 'org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:188)', 'org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:257)', 'org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:270)', 'org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:341)', 'org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66)', 'org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:88)', 'org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:238)', 'org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:225)', 'org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:228)', 'opt.logstash.logstash_minus_core.lib.logstash.agent.RUBY$block$converge_state$2(/opt/logstash/logstash-core/lib/logstash/agent.rb:386)', 'org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:141)', 'org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:64)', 'org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:58)', 'org.jruby.runtime.Block.call(Block.java:144)', 'org.jruby.RubyProc.call(RubyProc.java:352)', 'org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:111)', 'java.base/java.lang.Thread.run(Thread.java:829)']}
From the following traceback, logstash is trying to execute the reload pipeline
github.com
begin
logger.debug("Executing action", :action => action)
action_result = action.execute(self, @pipelines_registry)
converge_result.add(action, action_result)
unless action_result.successful?
logger.error("Failed to execute action",
:id => action.pipeline_id,
:action_type => action_result.class,
:message => action_result.message,
:backtrace => action_result.backtrace
)
end
Further tracing back, probably when the below lines of code is executed it is returning null and leaving the lock not getting removed.
github.com
elastic/logstash/blob/main/logstash-core/lib/logstash/pipeline_action/reload.rb#L39-L42
def execute(agent, pipelines_registry)
old_pipeline = pipelines_registry.get_pipeline(pipeline_id)
if old_pipeline.nil?
Could you please let us know on what all scenarios NullPointerException, LockException is occurred. In case if the transaction has failed then as a rescue should it clean the lock for upcoming transaction to complete successfully.
The text was updated successfully, but these errors were encountered: