Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fnserver container has no access to internal cluster network - cannot ping mysql database #1590

Open
christiancadieux opened this issue Mar 13, 2022 · 0 comments

Comments

@christiancadieux
Copy link

christiancadieux commented Mar 13, 2022

Description
fnserver container has no access to internal cluster network - cannot ping mysql database

Steps to reproduce the issue:

  1. start fn in kubernetes 1.19.15 bare-metal cluster on flatcar 5.4

Describe the results you received:

WARNINGS WHEN FNSERVER STARTS:

time="2022-03-13T19:08:10.893860840Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2022-03-13T19:08:10.895399404Z" level=info msg="libcontainerd: started new docker-containerd process" pid=36
time="2022-03-13T19:08:10Z" level=info msg="starting containerd" module=containerd revision=89623f28b87a6004d4b785663257362d1658a729 version=v1.0.0 
time="2022-03-13T19:08:10Z" level=info msg="setting subreaper..." module=containerd 
time="2022-03-13T19:08:10Z" level=info msg="changing OOM score to -500" module=containerd 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." module=containerd type=io.containerd.content.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." module=containerd type=io.containerd.snapshotter.v1 
time="2022-03-13T19:08:10Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module=containerd 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." module=containerd type=io.containerd.snapshotter.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." module=containerd type=io.containerd.metadata.v1 
time="2022-03-13T19:08:10Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module="containerd/io.containerd.metadata.v1.bolt" 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." module=containerd type=io.containerd.differ.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." module=containerd type=io.containerd.gc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." module=containerd type=io.containerd.monitor.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." module=containerd type=io.containerd.runtime.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." module=containerd type=io.containerd.grpc.v1 
time="2022-03-13T19:08:10Z" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug" 
time="2022-03-13T19:08:10Z" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc" 
time="2022-03-13T19:08:10Z" level=info msg="containerd successfully booted in 0.076981s" module=containerd 
time="2022-03-13T19:08:11.091836012Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2022-03-13T19:08:11.092238813Z" level=warning msg="Your kernel does not support cgroup blkio weight"
time="2022-03-13T19:08:11.092270582Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
time="2022-03-13T19:08:11.093089636Z" level=info msg="Loading containers: start."
time="2022-03-13T19:08:11.098728143Z" level=warning msg="Running modprobe nf_nat failed with message: `ip: can't find device 'nf_nat'\nnf_nat                 45056  4 ip6table_nat,xt_nat,xt_MASQUERADE,iptable_nat\nnf_conntrack          135168  7 xt_CT,nf_conntrack_netlink,xt_nat,xt_MASQUERADE,xt_conntrack,ip_vs,nf_nat\nlibcrc32c              16384  3 ip_vs,nf_nat,nf_conntrack\nmodprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1"
time="2022-03-13T19:08:11.103532819Z" level=warning msg="Running modprobe xt_conntrack failed with message: `ip: can't find device 'xt_conntrack'\nxt_conntrack           16384 301 \nnf_conntrack          135168  7 xt_CT,nf_conntrack_netlink,xt_nat,xt_MASQUERADE,xt_conntrack,ip_vs,nf_nat\nmodprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1"
time="2022-03-13T19:08:11.320902701Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2022-03-13T19:08:11.410758897Z" level=info msg="Loading containers: done."
time="2022-03-13T19:08:11.490784913Z" level=info msg="Docker daemon" commit=c97c6d6 graphdriver(s)=overlay2 version=17.12.0-ce
time="2022-03-13T19:08:11.491124348Z" level=info msg="Daemon has completed initialization"
time="2022-03-13T19:08:11.505286843Z" level=info msg="API listen on [::]:2375"
time="2022-03-13T19:08:11.505392148Z" level=info msg="API listen on /var/run/docker.sock"
time="2022-03-13T19:08:13Z" level=info msg="Setting log level to" fields.level=DEBUG
time="2022-03-13T19:08:13Z" level=info msg="Registering data store provider 'sql'"
server NewFromEnv
server.New
time="2022-03-13T19:08:13Z" level=info msg="using LB Base URL: 'http://rdeifn.lb.fn.internal:90'"
time="2022-03-13T19:08:13Z" level=debug msg="creating new datastore" db=mysql
time="2022-03-13T19:08:13Z" level=info msg="Connecting to DB" url="mysql://fnapp:boomsauce@tcp(rdeifn-mysql:3306)/fndb"

*** HANGS HERE - CANNOT PING DATABASE

kernel

/app # uname -a
Linux rdeifn-fn-847478b4bc-76cgh 5.4.77-flatcar #1 SMP Wed Nov 18 17:29:43 -00 2020 x86_64 Linux

Describe the results you expected:
need to connect to mysql db. can connect to mysql db from other pods in same namespace.

Additional information you deem important (e.g. issue happens only occasionally):

Output of fn version (CLI command):

Client version is latest version: 0.6.17
Server version: ?   <<< fnserver is not ready.

Additional environment details (OSX, Linux, flags, etc.):

$ k logs -f rdeifn-fn-847478b4bc-76cgh  -c runner-lb
/usr/local/bin/preentry.sh: set: line 14: can't access tty; job control turned off
time="2022-03-13T18:46:17.094678038Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
listen tcp 0.0.0.0:2375: bind: address already in use
time="2022-03-13T18:46:20Z" level=info msg="Setting log level to" fields.level=INFO
time="2022-03-13T18:46:20Z" level=info msg="Registering data store provider 'sql'"
server NewFromEnv
server.New
time="2022-03-13T18:46:20Z" level=info msg="Starting static runner pool" runners="[rdeifn-fn-runner.cadieux2.svc.cluster.local:9191]"
time="2022-03-13T18:46:20Z" level=info msg="Connected to runner" runner_addr="rdeifn-fn-runner.cadieux2.svc.cluster.local:9191"
time="2022-03-13T18:46:20Z" level=info msg="Creating new naive runnerpool placer with config=&{RetryAllDelay:10ms PlacerTimeout:6m0s DetachedPlacerTimeout:30s}"
time="2022-03-13T18:46:20Z" level=info msg="lb-agent starting cfg={MinDockerVersion:17.10.0-ce ContainerLabelTag: DockerNetworks: DockerLoadFile: DisableUnprivilegedContainers:false FreezeIdle:50ms HotPoll:200ms HotLauncherTimeout:1h0m0s HotPullTimeout:10m0s HotStartTimeout:5s DetachedHeadRoom:6m0s MaxResponseSize:0 MaxHdrResponseSize:0 MaxLogSize:1048576 MaxTotalCPU:0 MaxTotalMemory:0 MaxFsSize:0 MaxPIDs:50 MaxOpenFiles:0xc4201d4c00 MaxLockedMemory:0xc4201d4c08 MaxPendingSignals:0xc4201d4c10 MaxMessageQueue:0xc4201d4c18 PreForkPoolSize:0 PreForkImage:busybox PreForkCmd:tail -f /dev/null PreForkUseOnce:0 PreForkNetworks: EnableNBResourceTracker:false MaxTmpFsInodes:0 DisableReadOnlyRootFs:false DisableDebugUserLogs:false IOFSEnableTmpfs:false EnableFDKDebugInfo:false IOFSAgentPath: IOFSMountRoot: IOFSOpts: ImageCleanMaxSize:0 ImageCleanExemptTags: ImageEnableVolume:false}"
server.New completed
funcServer.Start
server.Start
time="2022-03-13T18:46:20Z" level=info msg="\n        ______\n       / ____/___\n      / /_  / __ \\\n     / __/ / / / /\n    /_/   /_/ /_/\n"
time="2022-03-13T18:46:20Z" level=info msg="Fn serving on `:90`" type=lb version=0.3.749
time="2022-03-13T18:46:25Z" level=warning msg="Created insecure grpc connection" grpc_addr="rdeifn-fn-runner.cadieux2.svc.cluster.local:9191" runner_addr="rdeifn-fn-runner.cadieux2.svc.cluster.local:9191"

All other pods work:

NAME                                        READY   STATUS    RESTARTS   AGE
ingress-nginx-controller-857c6b8d6c-vbtbj   1/1     Running   0          131m
r-dep-nginx-1-7ff774bfb5-mr76f              1/1     Running   0          19h
r-sts-tools-0                               1/1     Running   0          12h
rdeifn-fn-847478b4bc-76cgh                  1/2     Running   1          34m
rdeifn-fn-flow-depl-67db6765bc-cql2j        1/1     Running   0          12h
rdeifn-fn-runner-5cc448f875-fgbgv           1/1     Running   0          13h
rdeifn-fn-runner-5cc448f875-lb7b5           1/1     Running   0          13h
rdeifn-fn-runner-5cc448f875-q5dvs           1/1     Running   0          13h
rdeifn-fn-ui-7777796869-n76wz               1/1     Running   0          13h
rdeifn-mysql-765fb6dc7-8vlh9                1/1     Running   0          13h
rdeifn-redis-57fd48cf5b-zv5wz               1/1     Running   0          13h

tried the IPNO of the mysql service directly - also hang.

networking

/app # ifconfig -a
docker0   Link encap:Ethernet  HWaddr 02:42:7C:9E:E8:76  
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr A6:B9:7B:48:D6:E2  
          inet addr:192.168.92.200  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:876 errors:0 dropped:0 overruns:0 frame:0
          TX packets:996 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:72298 (70.6 KiB)  TX bytes:60142 (58.7 KiB)

ip6tnl0   Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          NOARP  MTU:1452  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tunl0     Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-9A-19-00-00-00-00-00-00-00-00  
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

kube spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: rdeifn-fn
    chart: fn-0.1.0
    heritage: Helm
    iproject: oracle-fn
    release: rdeifn
  name: rdeifn-fn
  namespace: namespace-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rdeifn-fn
      chart: fn-0.1.0
      heritage: Helm
      iproject: oracle-fn
      release: rdeifn
      role: fn-service
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rdeifn-fn
        chart: fn-0.1.0
        heritage: Helm
        iproject: oracle-fn
        release: rdeifn
        role: fn-service
    spec:
      affinity: {}
      containers:
      - env:
        - name: FN_DB_PASSWD
          valueFrom:
            secretKeyRef:
              key: mysql-password
              name: rdeifn-mysql
        - name: FN_DB_HOST
          value: rdeifn-mysql
        - name: FN_MQ_HOST
          value: rdeifn-redis
        - name: FN_PORT
          value: "80"
        - name: FN_NODE_TYPE
          value: api
        - name: FN_PUBLIC_LB_URL
          value: http://rdeifn.lb.fn.internal:90
        - name: FN_DB_URL
          value: mysql://fnapp:$(FN_DB_PASSWD)@tcp($(FN_DB_HOST):3306)/fndb
        - name: FN_LOG_LEVEL
          value: DEBUG
        - name: FN_MQ_URL
          value: redis://$(FN_MQ_HOST):6379/
        image: hub.comcast.net/k8s-eng/rdei-ide/fnproject/fnserver:cc
        imagePullPolicy: Always
        name: api
        ports:
        - containerPort: 80
          name: p80
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /v2/apps
            port: 80
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 3
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 300m
            memory: 2Gi
          requests:
            cpu: 150m
            memory: 512Mi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - env:
        - name: FN_NODE_TYPE
          value: lb
        - name: FN_GRPC_PORT
          value: "9191"
        - name: FN_PORT
          value: "90"
        - name: FN_RUNNER_API_URL
          value: http://rdeifn-fn.namespace-test.svc.cluster.local:80
        - name: FN_RUNNER_ADDRESSES
          value: rdeifn-fn-runner.namespace-test.svc.cluster.local:9191
        - name: FN_LOG_LEVEL
          value: INFO
        image: hub.comcast.net/k8s-eng/rdei-ide/fnproject/fnserver:cc
        imagePullPolicy: Always
        name: runner-lb
        ports:
        - containerPort: 90
          name: p90
          protocol: TCP
        resources:
          limits:
            cpu: 300m
            memory: 2Gi
          requests:
            cpu: 150m
            memory: 512Mi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default

UPDATE

So I restarted the fnserver pod with hostNetwork=true, and that crashed the pod, but then I removed hostNetwork and restarted the pod again and now it works. It looks like running fnserver against the host network reconfigured something in the host that fixed the problem, not sure.
Anyway, in my case ,this problem, which I don't understand, is gone.

NAME                                        READY   STATUS    RESTARTS   AGE
ingress-nginx-controller-857c6b8d6c-vbtbj   1/1     Running   0          25h
r-sts-tools-0                               1/1     Running   1          36h
rdeifn-fn-557c5bd749-9z2vf                  2/2     Running   0          16h
rdeifn-fn-flow-depl-67db6765bc-cql2j        1/1     Running   0          35h
rdeifn-fn-runner-5cc448f875-fgbgv           1/1     Running   0          37h
rdeifn-fn-runner-5cc448f875-lb7b5           1/1     Running   0          37h
rdeifn-fn-runner-5cc448f875-q5dvs           1/1     Running   0          37h
rdeifn-fn-ui-7777796869-n76wz               1/1     Running   0          37h
rdeifn-mysql-765fb6dc7-8vlh9                1/1     Running   0          37h
rdeifn-redis-57fd48cf5b-zv5wz               1/1     Running   0          37h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant