Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A potential goleak in cluster.go #47859

Open
xuxiaofan1203 opened this issue May 25, 2024 · 1 comment
Open

A potential goleak in cluster.go #47859

xuxiaofan1203 opened this issue May 25, 2024 · 1 comment
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage version/master

Comments

@xuxiaofan1203
Copy link

xuxiaofan1203 commented May 25, 2024

Description

Hello, I found in the function clusterLeave()

func (nDB *NetworkDB) clusterLeave() error {

func (nDB *NetworkDB) clusterLeave() error {
	mlist := nDB.memberlist


	if err := nDB.sendNodeEvent(NodeEventTypeLeave); err != nil {
		log.G(context.TODO()).Errorf("failed to send node leave: %v", err)
	}


	if err := mlist.Leave(time.Second); err != nil {
		return err
	}


	// cancel the context
	nDB.cancelCtx()


	for _, t := range nDB.tickers {
		t.Stop()
	}


	return mlist.Shutdown()
}

If the mlist.Leave() return err, the nDB.cancelCtx() below will not get executed.

if err := mlist.Leave(time.Second); err != nil {
return err
}
// cancel the context
nDB.cancelCtx()

And it will lead the <-nDB.ctx.Done in triggerFunc() blocked persistently, so the goroutine leak.

go nDB.triggerFunc(trigger.interval, t.C, trigger.fn)

blocking position:

for {
select {
case <-C:
f()
case <-nDB.ctx.Done():
return
}
}

Reproduce

I reproduce the bug by goleak.
Firstly, I modified the judge condition from err != nil to err == nil. Because I don't know how to let err != nil, the change only to make the return err can be executed easier. I'm not sure whether the change can lead other influences.
Normally:

	if err := mlist.Leave(time.Second); err != nil {
		return err
	}

After modified:

	if err := mlist.Leave(time.Second); err == nil {
		return err
	}

Then I used goleak to test in these test function related the funciton.

func TestNetworkDBSimple(t *testing.T) {

Like this:
O` O$Y@3FVFWM%V4H3AQ8EH
The result shows that there is a bug at the <-nDB.ctx.Done
DPNHW@V DSD))B4B5~TO5M

Expected behavior

No response

docker version

latest

docker info

latest

Additional Info

In short, I think the bug is caused by return but have not called the cancelFunc. I have tried to describe it in detail.

@xuxiaofan1203 xuxiaofan1203 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels May 25, 2024
@thaJeztah
Copy link
Member

cc @akerouanton @corhere

@xuxiaofan1203 xuxiaofan1203 changed the title A potential goleak in libnetwork/networkdb/cluster.go A potential goleak in cluster.go May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage version/master
Projects
None yet
Development

No branches or pull requests

2 participants