Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy Proxy for Terminating Gateway fails to configure dynamic cluster #21370

Open
cyclops23 opened this issue Apr 18, 2024 · 1 comment
Open
Labels
theme/terminating-gw Track terminating gateway work type/bug Feature does not function as expected

Comments

@cyclops23
Copy link

Nomad version

Nomad v1.7.6
BuildDate 2024-03-12T07:27:36Z
Revision 594fedbfbc4f0e532b65e8a69b28ff9403eb822e

Consul version

Consul v1.18.1
Revision 98cb473c
Build Date 2024-03-26T21:59:08Z

Operating system and Environment details

Linux ip-XX-XX-XXX-XXX 5.15.0-1056-aws hashicorp/nomad#61~20.04.1-Ubuntu SMP Wed Mar 13 17:45:04 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Issue

I'm attempting to set up a terminating gateway for DynamoDB. The Envoy proxy is started successfully but the dynamic cluster representing the terminating gateway service is never added.

In the Consul / Nomad UIs everything looks good:

Screenshot 2024-04-18 at 12 03 21 Screenshot 2024-04-18 at 12 03 49

however the external service is not accessible through the service mesh.

Reproduction steps

  1. Create an external service in Consul
  2. Create a terminating gateway job in Nomad

I've uploaded the relevant configuration files to https://github.com/cyclops23/nomad-bug-tgw

Expected Result

The external service should be accessible through the terminating gateway (or some meaningful error message should be provided if there is a problem with the configuration).

Expect to see the dynamic cluster representing the external service to be added to Envoy like this example:

[2024-04-18 11:42:34.877][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:222] cm init: initializing cds
[2024-04-18 11:42:34.879][1][info][main] [source/server/server.cc:934] starting main dispatch loop
[2024-04-18 11:42:34.884][1][info][upstream] [source/common/upstream/cds_api_helper.cc:32] cds: add 1 cluster(s), remove 0 cluster(s)
[2024-04-18 11:42:34.919][1][info][upstream] [source/common/upstream/cds_api_helper.cc:71] cds: added/updated 1 cluster(s), skipped 0 unmodified cluster(s)
[2024-04-18 11:42:34.921][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:226] cm init: all clusters initialized

Actual Result

Requests to the external service are routed to the terminating gateway and fail.

Inspecting the Envoy logs shows that the cluster for the gateway is never added via xDS:

[info] cm init: initializing cds
[info] starting main dispatch loop
[debug] [Tags: \"ConnectionId\":\"0\"] connected
[debug] [Tags: \"ConnectionId\":\"0\"] connected
[debug] [Tags: \"ConnectionId\":\"0\"] attaching to next stream
[debug] [Tags: \"ConnectionId\":\"0\"] creating stream
[debug] [Tags: \"ConnectionId\":\"0\",\"StreamId\":\"10539488469085721247\"] pool ready
[debug] [Tags: \"ConnectionId\":\"0\",\"StreamId\":\"10539488469085721247\"] upstream headers complete: end_stream=false
[debug] async http request response headers (end_stream=false):\n':status', '200'\n'content-type', 'application/grpc'\n
[debug] Received DeltaDiscoveryResponse for type.googleapis.com/envoy.config.cluster.v3.Cluster at version 
[info] cds: add 0 cluster(s), remove 0 cluster(s)
[info] cds: added/updated 0 cluster(s), skipped 0 unmodified cluster(s)
[debug] maybe finish initialize state: 4
[debug] maybe finish initialize primary init clusters empty: true
[debug] maybe finish initialize secondary init clusters empty: true
[debug] maybe finish initialize cds api ready: true
[info] cm init: all clusters initialized

Additional config / debug info

# consul config read -kind terminating-gateway -name ext-dynamodb-tgw
{
    "Kind": "terminating-gateway",
    "Name": "ext-dynamodb-tgw",
    "Services": [
        {
            "Name": "ext-dynamodb",
            "CAFile": "/etc/ssl/certs/Amazon_Root_CA_1.pem",
            "SNI": "dynamodb.us-east-1.amazonaws.com"
        }
    ],
    "CreateIndex": 438709,
    "ModifyIndex": 438709
}
# curl -s -H "X-Consul-Token:${CONSUL_HTTP_TOKEN}" "${CONSUL_HTTP_ADDR}/v1/catalog/service/ext-dynamodb-tgw" | jq  '.[] | { ServiceKind, ServiceName, ServiceID }'
{
  "ServiceKind": "terminating-gateway",
  "ServiceName": "ext-dynamodb-tgw",
  "ServiceID": "_nomad-task-f0b1b6d5-ef0f-ec7c-14c6-3112685453aa-group-ext-dynamodb-tgw-ext-dynamodb-tgw-connect-terminating-ext-dynamodb-tgw"
}
# cat .envoy_bootstrap.cmd
connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr 127.0.0.1:8501 -admin-bind 127.0.0.2:19000 -address 127.0.0.1:19100 -proxy-id _nomad-task-f0b1b6d5-ef0f-ec7c-14c6-3112685453aa-group-ext-dynamodb-tgw-ext-dynamodb-tgw-connect-terminating-ext-dynamodb-tgw -bootstrap -gateway terminating -token <REDACTED> -grpc-ca-file /opt/consul/tls/ca.pem -ca-file /opt/consul/tls/ca.pem -client-cert /opt/nomad/tls/cert.pem -client-key /opt/nomad/tls/private-key.pem
# cat .envoy_bootstrap.env
[
    "LANG=C.UTF-8",
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin",
    "HOME=/root",
    "LOGNAME=root",
    "USER=root",
    "SHELL=/bin/sh",
    "INVOCATION_ID=92dfa72f396a42d69b4c3d62c526fcc5",
    "JOURNAL_STREAM=8:27232",
    "CONSUL_HTTP_SSL=true",
    "CONSUL_HTTP_SSL_VERIFY=false",
    "NOMAD_ALLOC_ID=f0b1b6d5-ef0f-ec7c-14c6-3112685453aa",
    "NOMAD_SHORT_ALLOC_ID=f0b1b6d5",
    "NOMAD_ALLOC_NAME=ext-dynamodb-tgw.ext-dynamodb-tgw[0]",
    "NOMAD_GROUP_NAME=ext-dynamodb-tgw",
    "NOMAD_JOB_NAME=ext-dynamodb-tgw",
    "NOMAD_JOB_ID=ext-dynamodb-tgw",
    "NOMAD_NAMESPACE=default",
    "NOMAD_REGION=global"
]
# cat envoy_bootstrap.json
{
  "admin": {
    "access_log": [
      {
        "name": "Consul Listener Filter Log",
        "typedConfig": {
          "@type": "type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog",
          "logFormat": {
            ...
          }
        }
      }
    ],
    "address": {
      "socket_address": {
        "address": "127.0.0.2",
        "port_value": 19000
      }
    }
  },
  "node": {
    "cluster": "terminating-gateway",
    "id": "_nomad-task-f0b1b6d5-ef0f-ec7c-14c6-3112685453aa-group-ext-dynamodb-tgw-ext-dynamodb-tgw-connect-terminating-ext-dynamodb-tgw",
    "metadata": {
      "namespace": "default",
      "partition": "default"
    }
  },
  "layered_runtime": {
    "layers": [
      {
        "name": "base",
        "static_layer": {
          "re2.max_program_size.error_level": 1048576
        }
      }
    ]
  },
  "static_resources": {
    "clusters": [
      {
        "name": "local_agent",
        "ignore_health_on_host_removal": false,
        "connect_timeout": "1s",
        "type": "STATIC",
        "transport_socket": {
          "name": "tls",
          "typed_config": {
            "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
            "common_tls_context": {
              "validation_context": {
                "trusted_ca": {
                  "inline_string": "-----BEGIN CERTIFICATE-----\n<REDACTED\n-----END CERTIFICATE-----\n"
                }
              }
            }
          }
        },
        "typed_extension_protocol_options": {
          "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": {
            "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions",
            "explicit_http_config": {
              "http2_protocol_options": {}
            }
          }
        },
        "loadAssignment": {
          "clusterName": "local_agent",
          "endpoints": [
            {
              "lbEndpoints": [
                {
                  "endpoint": {
                    "address": {
                      "pipe": {
                        "path": "alloc/tmp/consul_grpc.sock"
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      },
      {
        "name": "self_admin",
        "ignore_health_on_host_removal": false,
        "connect_timeout": "5s",
        "type": "STATIC",
        "typed_extension_protocol_options": {
          "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": {
            "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions",
            "explicit_http_config": {
              "http_protocol_options": {}
            }
          }
        },
        "loadAssignment": {
          "clusterName": "self_admin",
          "endpoints": [
            {
              "lbEndpoints": [
                {
                  "endpoint": {
                    "address": {
                      "socket_address": {
                        "address": "127.0.0.2",
                        "port_value": 19000
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      }
    ],
    "listeners": [
      {
        "name": "envoy_prometheus_metrics_listener",
        "address": {
          "socket_address": {
            "address": "127.0.0.1",
            "port_value": 9102
          }
        },
        "filter_chains": [
          {
            "filters": [
              {
                "name": "envoy.filters.network.http_connection_manager",
                "typedConfig": {
                  "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager",
                  "stat_prefix": "envoy_prometheus_metrics",
                  "codec_type": "HTTP1",
                  "route_config": {
                    "name": "self_admin_route",
                    "virtual_hosts": [
                      {
                        "name": "self_admin",
                        "domains": [
                          "*"
                        ],
                        "routes": [
                          {
                            "match": {
                              "path": "/metrics"
                            },
                            "route": {
                              "cluster": "self_admin",
                              "prefix_rewrite": "/stats/prometheus"
                            }
                          },
                          {
                            "match": {
                              "prefix": "/"
                            },
                            "direct_response": {
                              "status": 404
                            }
                          }
                        ]
                      }
                    ]
                  },
                  "http_filters": [
                    {
                      "name": "envoy.filters.http.router",
                      "typedConfig": {
                        "@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router"
                      }
                    }
                  ]
                }
              }
            ]
          }
        ]
      }
    ]
  },
  "stats_config": {
    "stats_tags": [
      ...
    ],
    "use_all_default_tags": true
  },
  "dynamic_resources": {
    "lds_config": {
      "ads": {},
      "initial_fetch_timeout": "0s",
      "resource_api_version": "V3"
    },
    "cds_config": {
      "ads": {},
      "initial_fetch_timeout": "0s",
      "resource_api_version": "V3"
    },
    "ads_config": {
      "api_type": "DELTA_GRPC",
      "transport_api_version": "V3",
      "grpc_services": {
        "initial_metadata": [
          {
            "key": "x-consul-token",
            "value": "<REDACTED>"
          }
        ],
        "envoy_grpc": {
          "cluster_name": "local_agent"
        }
      }
    }
  }
}

From the agent where the terminating gateway is running:

# curl -s http://127.0.0.0:8500/v1/agent/services | jq '.["_nomad-task-f0b1b6d5-ef0f-ec7c-14c6-3112685453aa-group-ext-dynamodb-tgw-ext-dynamodb-tgw-connect-terminating-ext-dynamodb-tgw"]'
{
  "Kind": "terminating-gateway",
  "ID": "_nomad-task-f0b1b6d5-ef0f-ec7c-14c6-3112685453aa-group-ext-dynamodb-tgw-ext-dynamodb-tgw-connect-terminating-ext-dynamodb-tgw",
  "Service": "ext-dynamodb-tgw",
  "Tags": [],
  "Meta": {
    "external-source": "nomad"
  },
  "Port": 28117,
  "Address": "<REDACTED>",
  "TaggedAddresses": {
    "lan_ipv4": {
      "Address": "<REDACTED>",
      "Port": 28117
    },
    "wan_ipv4": {
      "Address": "<REDACTED>",
      "Port": 28117
    }
  },
  "Weights": {
    "Passing": 1,
    "Warning": 1
  },
  "EnableTagOverride": false,
  "Proxy": {
    "Config": {
      "component_log_level": "upstream:trace,http:trace,router:trace,config:trace",
      "connect_timeout_ms": 5000,
      "envoy_gateway_bind_addresses": {
        "default": {
          "Address": "0.0.0.0",
          "Port": 28117
        }
      },
      "envoy_gateway_no_default_bind": true,
      "envoy_prometheus_bind_addr": "127.0.0.1:9102",
      "log_level": "debug",
      "protocol": "tcp"
    },
    "MeshGateway": {},
    "Expose": {},
    "AccessLogs": {
      "Enabled": true
    }
  },
  "Datacenter": "aws-us-east-1"
}

Please let me know if there are additional debugging steps you can suggest, or if you need more information on the issue.

@cyclops23 cyclops23 added the type/bug Feature does not function as expected label Apr 18, 2024
@tgross
Copy link
Member

tgross commented Jun 25, 2024

Hi @cyclops23! Apologies for the long delay in responding to this. Unfortunately I wasn't able to reproduce what you're seeing. I've cloned the repository you linked to and ran the deploy script, and got the following in the logs for the terminating GW proxy:

[2024-06-25 19:31:44.406][1][info][main] [source/server/server.cc:934] starting main dispatch loop
[2024-06-25 19:31:44.409][1][info][upstream] [source/common/upstream/cds_api_helper.cc:32] cds: add 1 cluster(s), remove 0 cluster(s)
[2024-06-25 19:31:44.466][1][info][upstream] [source/common/upstream/cds_api_helper.cc:71] cds: added/updated 1 cluster(s), skipped 0 unmodified cluster(s)
[2024-06-25 19:31:44.467][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:226] cm init: all clusters initialized
[2024-06-25 19:31:44.467][1][info][main] [source/server/server.cc:915] all clusters initialized. initializing init manager
[2024-06-25 19:31:44.470][1][info][upstream] [source/extensions/listener_managers/listener_manager/lds_api.cc:99] lds: add/update listener 'default:0.0.0.0:24076'
[2024-06-25 19:31:44.470][1][info][config] [source/extensions/listener_managers/listener_manager/listener_manager_impl.cc:923] all dependencies initialized. starting workers

Then I ran the following job to act as a test client (I've skipped using transparent proxy here but that should work as well):

downstream jobspec
job "curl" {

  group "group" {

    network {
      mode = "bridge"
      port "www" {
        to = 8001
      }
    }

    service {
      name = "count-dashboard"
      port = "8001"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "ext-dynamodb"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "task" {

      driver = "docker"

      config {
        image   = "curlimages/curl:latest"
        command = "tail"
        args    = ["-f"]
        ports   = ["www"]
      }

      resources {
        cpu    = 128
        memory = 256
      }

    }
  }
}

That allocation starts up just fine, and I'm able to curl DynamoDB via the upstream:

$ nomad alloc exec -task task 83fd /bin/sh
~ $ curl localhost:8080
healthy: dynamodb.us-east-1.amazonaws.com ~ $ ^C

I'd have you check the Nomad server logs to see what happened when it registered the gateway, but I can see from your consul config read that everything looks as I'd expect. Here's what mine looks like (with Consul Enterprise):

$ consul config read -kind terminating-gateway -name ext-dynamodb-tgw
{
    "Kind": "terminating-gateway",
    "Name": "ext-dynamodb-tgw",
    "Services": [
        {
            "Namespace": "default",
            "Name": "ext-dynamodb",
            "CAFile": "/etc/ssl/certs/Amazon_Root_CA_1.pem",
            "SNI": "dynamodb.us-east-1.amazonaws.com"
        }
    ],
    "CreateIndex": 1168,
    "ModifyIndex": 1168,
    "Partition": "default",
    "Namespace": "default"
}

At this point I feel pretty confident that Nomad has configured the gateway as you've requested. I'm going to transfer this issue over to the Consul repository, in hopes that folks there will have a better handle on where to look next.

@tgross tgross transferred this issue from hashicorp/nomad Jun 25, 2024
@tgross tgross added the theme/terminating-gw Track terminating gateway work label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/terminating-gw Track terminating gateway work type/bug Feature does not function as expected
Projects
No open projects
Development

No branches or pull requests

2 participants