Debug Ambient Mesh In Local

image

LOCAL.md 社区提供了在本地 Debug 的方案,这里主要梳理下。

在 Kind 中安装 Ambient

  1. 首先安装 kind
1
$ go install sigs.k8s.io/kind@v0.22.0
  1. 然后创建一个 kind cluster
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ kind create cluster --config=- <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: ambient
nodes:
- role: control-plane
- role: worker
extraMounts:
- hostPath: /tmp/worker1-ztunnel/
containerPath: /var/run/ztunnel/
- role: worker
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
endpoint = ["http://${KIND_REGISTRY_NAME}:5000"]
EOF

相较于社区的标准安装,将 worker 节点的 /var/run/ztunnel 目录挂载到本地的 /tmp/worker1-ztunnel 目录,方便后续调试。

  1. 安装 Gateway-API
1
2
$ kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
{ kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=444631bfe06f3bcca5d0eadf1857eac1d369421d" | kubectl apply -f -; }
  1. 安装 ambient

新架构从 1.21 开始支持,记得更新 istioctl 的版本

1
$ istioctl install --set profile=ambient --set "components.ingressGateways[0].enabled=true" --set "components.ingressGateways[0].name=istio-ingressgateway" --skip-confirmation

部署完成如下

1
2
3
4
5
6
7
8
9
10
$ k get pod -n istio-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
istio-cni-node-c5q87 1/1 Running 0 4m30s 172.19.0.2 ambient-control-plane <none> <none>
istio-cni-node-j5wm7 1/1 Running 0 4m30s 172.19.0.3 ambient-worker2 <none> <none>
istio-cni-node-jgnp8 1/1 Running 0 4m30s 172.19.0.4 ambient-worker <none> <none>
istio-ingressgateway-7b887dd67f-dvxvp 1/1 Running 0 4m30s 10.244.2.4 ambient-worker <none> <none>
istiod-fcb4d989b-kj4mr 1/1 Running 0 4m46s 10.244.1.4 ambient-worker2 <none> <none>
ztunnel-58vjn 1/1 Running 0 4m8s 10.244.2.5 ambient-worker <none> <none>
ztunnel-7wmp9 1/1 Running 0 4m8s 10.244.1.5 ambient-worker2 <none> <none>
ztunnel-mm5bb 1/1 Running 0 4m8s 10.244.0.6 ambient-control-plane <none> <none>
  1. 部署 DEMO 应用
1
2
3
4
5
$ kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
$ kubectl apply -f samples/sleep/sleep.yaml
$ kubectl apply -f samples/sleep/notsleep.yaml
# 安装入口网关
$ kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml

部署结果如下

1
2
3
4
5
6
7
8
9
10
$ k get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
details-v1-698d88b-q6w6q 1/1 Running 0 6m56s 10.244.2.6 ambient-worker <none> <none>
notsleep-5c785bc478-84lx2 1/1 Running 0 6m53s 10.244.1.6 ambient-worker2 <none> <none>
productpage-v1-675fc69cf-jg9k5 1/1 Running 0 6m56s 10.244.2.11 ambient-worker <none> <none>
ratings-v1-6484c4d9bb-wwmsv 1/1 Running 0 6m56s 10.244.2.7 ambient-worker <none> <none>
reviews-v1-5b5d6494f4-qkg4d 1/1 Running 0 6m56s 10.244.2.8 ambient-worker <none> <none>
reviews-v2-5b667bcbf8-tptg9 1/1 Running 0 6m56s 10.244.2.9 ambient-worker <none> <none>
reviews-v3-5b9bd44f4-24pcl 1/1 Running 0 6m56s 10.244.2.10 ambient-worker <none> <none>
sleep-7656cf8794-75mjn 1/1 Running 0 6m53s 10.244.2.12 ambient-worker <none> <none>

本地运行 Ztunnel

  1. 驱逐 Ztunnel 组件
1
2
$ kubectl label node ambient-worker ztunnel=no
$ kubectl patch daemonset -n istio-system ztunnel --type=merge -p='{"spec":{"template":{"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"ztunnel","operator":"NotIn","values":["no"]}]}]}}}}}}}'

执行完成之后,相当于我们把 Ztunnel 禁止在 Worker 节点上运行。

1
2
3
4
5
6
7
8
9
$ kubectl get pod -n istio-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
istio-cni-node-drldl 1/1 Running 0 2m17s 172.19.0.2 ambient-worker2 <none> <none>
istio-cni-node-sl5wt 1/1 Running 0 2m17s 172.19.0.4 ambient-worker <none> <none>
istio-cni-node-zckhs 1/1 Running 0 2m17s 172.19.0.3 ambient-control-plane <none> <none>
istio-ingressgateway-7b887dd67f-hxq4n 1/1 Running 0 2m17s 10.244.2.2 ambient-worker <none> <none>
istiod-fcb4d989b-p8x9p 1/1 Running 0 2m32s 10.244.1.2 ambient-worker2 <none> <none>
ztunnel-42nn5 1/1 Running 0 13s 10.244.0.6 ambient-control-plane <none> <none>
ztunnel-gkgbd 1/1 Running 0 12s 10.244.1.5 ambient-worker2 <none> <none>
  1. 获得 Root-CA
1
2
# 重启之后需要重新获取
$ kubectl get cm -n istio-system istio-ca-root-cert -o jsonpath='{.data.root-cert\.pem}' > /tmp/istio-root.pem
  1. 创建一个 Fake POD,获取 Istio-Token
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: fake-tunnel-worker1
namespace: istio-system
spec:
nodeName: ambient-worker
terminationGracePeriodSeconds: 1
serviceAccountName: ztunnel
containers:
- name: cat-token
image: ubuntu:22.04
command:
- bash
- -c
args:
- "sleep 10000"
ports:
- containerPort: 80
volumeMounts:
- mountPath: /var/run/secrets/tokens
name: istio-token
volumes:
- name: istio-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: istio-ca
expirationSeconds: 43200
path: istio-token
EOF
```

4. 暴露 istiod 到本地

```bash
$ kubectl port-forward -n istio-system svc/istiod 15012:15012
  1. 准备启动参数

这部分在 ztunel项目目录中执行

1
2
$ mkdir -p ./var/run/secrets/tokens/
$ kubectl exec -n istio-system fake-tunnel-worker1 -- cat /var/run/secrets/tokens/istio-token > ./var/run/secrets/tokens/istio-token
  1. 起飞
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ xargs env <<EOF
INPOD_UDS=/tmp/worker1-ztunnel/ztunnel.sock
CLUSTER_ID=Kubernetes
RUST_LOG=debug
INPOD_ENABLED="true"
ISTIO_META_DNS_CAPTURE="true"
ISTIO_META_DNS_PROXY_ADDR="127.0.0.1:15053"
SERVICE_ACCOUNT=ztunnel
NODE_NAME=ambient-worker
POD_NAMESPACE=istio-system
POD_NAME=ztunnel-worker1
CA_ROOT_CA=/tmp/istio-root.pem
XDS_ROOT_CA=/tmp/istio-root.pem
CARGO_TARGET_$(rustc -vV | sed -n 's|host: ||p' | tr '[:lower:]' '[:upper:]'| tr - _)_RUNNER="sudo -E"
cargo run proxy ztunnel
EOF

这里的环境变量指

  • INPOD_UDS: 和 CNI 通讯的 UDS Sock
  • ISTIO_META_DNS_CAPTURE: 启动 DNS Proxy
  • ISTIO_META_DNS_PROXY_ADDR: DNS Proxy 地址

其他的比较常见就不介绍了。

参考