Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (can't connect to agent in tsh) in HighThroughputTest.test_decommission_and_add #19922

Open
vbotbuildovich opened this issue Jun 20, 2024 · 1 comment
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jun 20, 2024

https://buildkite.com/redpanda/vtools/builds/14657

Module: rptest.redpanda_cloud_tests.high_throughput_test
Class: HighThroughputTest
Method: test_decommission_and_add
test_id:    HighThroughputTest.test_decommission_and_add
status:     FAIL
run time:   1066.948 seconds

CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cpl7uk6e2nclrjtsl4dg-agent', 'kubectl', '--context=arn:aws:eks:us-west-2:471112860801:cluster/redpanda-cpl7uk6e2nclrjtsl4dg', 'run', '--image docker.io/library/alpine', '--restart=Never', '--overrides=\'{"spec": {"nodeName": "ip-10-1-7-26.us-west-2.compute.internal", "hostPID": true, "hostNetwork": true, "containers": [{"securityContext": {"privileged": true}, "image": "docker.io/library/alpine", "name": "nsenter", "stdin": true, "stdinOnce": true, "tty": true, "command": ["nsenter", "--target", "1", "--mount", "--uts", "--ipc", "--net", "--pid", "bash", "-l"], "resources": {}, "volumeMounts": []}], "tolerations": [{"key": "CriticalAddonsOnly", "operator": "Exists"}, {"effect": "NoExecute", "operator": "Exists"}], "volumes": []}}\'', 'ip-10-1-7-26.us-west-2.compute.internal-pshell'], '', '\x1b[31mERROR: \x1b[0mfailed connecting to host cpl7uk6e2nclrjtsl4dg-agent:0: failed to receive cluster details response\n\tfailed to dial target host\n\tTeleport proxy failed to connect to "node" agent "@local-node" over reverse tunnel:\n\n  no tunnel connection found: no node reverse tunnel for 3f1f63b5-62ce-43de-b83f-a118c2b8dc56.proxy.tp.redpanda.com found\n\nThis usually means that the agent is offline or has disconnected. Check the\nagent logs and, if the issue persists, try restarting it or re-registering it\nwith the cluster.\n\n')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 177, in _do_run
    self.test = self.test_context.cls(self.test_context)
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/high_throughput_test.py", line 242, in __init__
    super(HighThroughputTest, self).__init__(test_context=test_ctx,
  File "/home/ubuntu/redpanda/tests/rptest/utils/test_mixins.py", line 29, in __init__
    super().__init__(**kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/redpanda_cloud_test.py", line 25, in __init__
    self.redpanda = make_redpanda_cloud_service(test_context)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 5082, in make_redpanda_cloud_service
    return RedpandaServiceCloud(context,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1695, in __init__
    self.rebuild_pods_classes()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1721, in rebuild_pods_classes
    self.pods = [
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1722, in <listcomp>
    CloudBroker(p, self.kubectl, self.logger)
  File "/home/ubuntu/redpanda/tests/rptest/services/cloud_broker.py", line 62, in __init__
    self.nodeshell.initialize_nodeshell()
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 504, in initialize_nodeshell
    _out = self.kubectl.cmd([
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 276, in cmd
    return self._ssh_cmd(cmd, capture=capture)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 252, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 232, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cpl7uk6e2nclrjtsl4dg-agent', 'kubectl', '--context=arn:aws:eks:us-west-2:471112860801:cluster/redpanda-cpl7uk6e2nclrjtsl4dg', 'run', '--image docker.io/library/alpine', '--restart=Never', '--overrides=\'{"spec": {"nodeName": "ip-10-1-7-26.us-west-2.compute.internal", "hostPID": true, "hostNetwork": true, "containers": [{"securityContext": {"privileged": true}, "image": "docker.io/library/alpine", "name": "nsenter", "stdin": true, "stdinOnce": true, "tty": true, "command": ["nsenter", "--target", "1", "--mount", "--uts", "--ipc", "--net", "--pid", "bash", "-l"], "resources": {}, "volumeMounts": []}], "tolerations": [{"key": "CriticalAddonsOnly", "operator": "Exists"}, {"effect": "NoExecute", "operator": "Exists"}], "volumes": []}}\'', 'ip-10-1-7-26.us-west-2.compute.internal-pshell']' returned non-zero exit status 1.

JIRA Link: CORE-4246

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels Jun 20, 2024
@travisdowns
Copy link
Member

Can't connect to agent (no reverse tunnel):

'\x1b[31mERROR: \x1b[0mfailed connecting to host cpl7uk6e2nclrjtsl4dg-agent:0: failed to receive cluster details response\n\tfailed to dial target host\n\tTeleport proxy failed to connect to "node" agent "@local-node" over reverse tunnel:\n\n  no tunnel connection found: no node reverse tunnel for 3f1f63b5-62ce-43de-b83f-a118c2b8dc56.proxy.tp.redpanda.com found\n\nThis usually means that the agent is offline or has disconnected. Check the\nagent logs and, if the issue persists, try restarting it or re-registering it\nwith the cluster.\n\n')

@travisdowns travisdowns changed the title CI Failure (key symptom) in HighThroughputTest.test_decommission_and_add CI Failure (can't connect to agent in tsh) in HighThroughputTest.test_decommission_and_add Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure
Projects
None yet
Development

No branches or pull requests

2 participants