Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fjage and Fjage.jl startup stages don't agree #275

Open
ettersi opened this issue Mar 13, 2023 · 0 comments
Open

fjage and Fjage.jl startup stages don't agree #275

ettersi opened this issue Mar 13, 2023 · 0 comments
Assignees
Labels

Comments

@ettersi
Copy link
Collaborator

ettersi commented Mar 13, 2023

The Fjåge MasterContainer sends out an "{\"alive\": true}" message once initialisation completed.

protected void initComplete() {
synchronized(slaves) {
for (ConnectionHandler slave: slaves) {
if (!slave.isAlive()) slave.start();
}
}

public void run() {
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
out = new DataOutputStream(conn.getOutputStream());
if (keepAlive) {
if (closeOnDead) {
(new Thread(getName()+":init") {
@Override
public void run() {
println(ALIVE);
try {
Thread.sleep(TIMEOUT);
} catch (InterruptedException ex) {
// do nothing
}
if (!alive) {
log.fine("Connection dead");
close();
}
}
}).start();
} else {
println(ALIVE);
}
}

However, Container.send() only forwards messages after the container has started.

public boolean send(Message m, boolean relay) {
if (!running) return false;

This is a problem because Fjage.jl interprets the "{\"alive\": true}" messages as a signal that the master container is now ready to accept messages.

https://github.com/org-arl/Fjage.jl/blob/b109126034cbdc517d8003e783df890677657002/src/gw.jl#L145-L148
https://github.com/org-arl/Fjage.jl/blob/b109126034cbdc517d8003e783df890677657002/src/container.jl#L468-L476

The result is that messages sent by slave-container agents may not reach their destination if these messages happen to fall in the gap between init() and start() on the master container.

Demonstration:

using Fjage

@agent struct Dummy; end
function Fjage.startup(agent::Dummy)
    node = agentforservice(agent, "org.arl.unet.Services.NODE_INFO")
    # Shouldn't trigger, but does
    @assert !isnothing(node.address)
end

simulator = run(`bin/unet samples/2-node-network.groovy`, wait = false)
try
    container = SlaveContainer("localhost", 1101, reconnect = false)
    add(container, Dummy())
    while true
        try
            start(container)
            break
        catch e
            sleep(0.01)
        end
    end
    sleep(3.0)
finally
    kill(simulator)
end
@mchitre mchitre self-assigned this Mar 13, 2023
@mchitre mchitre added the bug label Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants