istio locality load balancing not working

首先我们知道 isito 是支持 地域路由的,比较同地域优先。istio locality-load-balancing 但是此文档缺失验证,本篇意在从 Inside 来看看是是怎么工作的。

易混关键词

  • Node: Istio 中的 Sidecar 节点(proxy节点)
  • KubeNode: k8s 中的 node 节点 (kubectl get nodes) 中的 node

正文

首先因为地域路由肯定需要地域信息,对于 当前节点目标地址 的地域信息。

当前节点

打开 Envoy 的控制台

1
istioctl dashboard envoy <YOUR-POD>

然后看看 Config_Dump 中的 Node 中的 locality 信息 ( 没有也是正常的看下文 )

1
2
3
4
"locality": {
"region": "cn-beijing",
"zone": "cn-beijing-b"
}

Istio 是如何获得这个值的

代码在 getNodeMeta

1
2
3
4
5
6
7
8
var l *core.Locality
if meta.Labels[model.LocalityLabel] == "" && options.Platform != nil {
// The locality string was not set, try to get locality from platform
l = options.Platform.Locality()
} else {
localityString := model.GetLocalityLabelOrDefault(meta.Labels[model.LocalityLabel], "")
l = util.ConvertLocality(localityString)
}

这里 isito 支持 GCP,aws 和 azure 自动获得这个信息,而其他的需要自己配置 LocalityLabel 这个 Label,也就是需要在 POD 中增加 这样的信息

1
istio-locality: cn-beijing.cn-beijing-a

如果不存在的话

当这个值不存在的时候,也不一定就完全不能工作。在下面有一个特殊的处理代码在
https://github.com/istio/istio/blob/d3676a8318c5f1380b21ceb583d2d9016ec7c26b/pilot/pkg/xds/ads.go#L554-L594

在节点注册到 istiod 的时候,会有一个初始化的过程,初始化的时候会尝试从 Endpoint 里面去匹配出一个地域信息出来。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// initializeProxy completes the initialization of a proxy. It is expected to be called only after
// initProxyMetadata.
func (s *DiscoveryServer) initializeProxy(node *core.Node, con *Connection) error {
proxy := con.proxy
// this should be done before we look for service instances, but after we load metadata
// TODO fix check in kubecontroller treat echo VMs like there isn't a pod
if err := s.WorkloadEntryController.RegisterWorkload(proxy, con.Connect); err != nil {
return err
}
s.computeProxyState(proxy, nil)

// Get the locality from the proxy's service instances.
// We expect all instances to have the same IP and therefore the same locality.
// So its enough to look at the first instance.
if len(proxy.ServiceInstances) > 0 {
proxy.Locality = util.ConvertLocality(proxy.ServiceInstances[0].Endpoint.Locality.Label)
}

// If there is no locality in the registry then use the one sent as part of the discovery request.
// This is not preferable as only the connected Pilot is aware of this proxies location, but it
// can still help provide some client-side Envoy context when load balancing based on location.
if util.IsLocalityEmpty(proxy.Locality) {
proxy.Locality = &core.Locality{
Region: node.Locality.GetRegion(),
Zone: node.Locality.GetZone(),
SubZone: node.Locality.GetSubZone(),
}
}

locality := util.LocalityToString(proxy.Locality)
// add topology labels to proxy metadata labels
proxy.Metadata.Labels = labelutil.AugmentLabels(proxy.Metadata.Labels, proxy.Metadata.ClusterID, locality, proxy.Metadata.Network)
// Discover supported IP Versions of proxy so that appropriate config can be delivered.
proxy.DiscoverIPVersions()

proxy.WatchedResources = map[string]*model.WatchedResource{}
// Based on node metadata and version, we can associate a different generator.
if proxy.Metadata.Generator != "" {
proxy.XdsResourceGenerator = s.Generators[proxy.Metadata.Generator]
}

return nil
}

如果此实例有 ServiceInstance 的话,会尝试从相同 Entpoint 中去找到相对应的地域信息,而 endpoint 的地域信息是从 KubeNode 中获取的,可以看下文。

这里比较抽象,下面举个例子

例子:无法生效的规则

如下的例子里,只有一个DP,没有Service ,这个实例就无法获得 ServiceInstances 列表,自然无法从上面的逻辑中去拼凑出一个地域信息出来。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: apps/v1
kind: Deployment
metadata:
name: netshoot
labels:
app: netshoot
spec:
replicas: 1
selector:
matchLabels:
app: netshoot
template:
metadata:
labels:
app: netshoot
spec:
containers:
- name: netshoot
image: nicolaka/netshoot
imagePullPolicy: IfNotPresent #Always
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
例子:可以生效的规则

增加了一个Service 就可以生效了,因为可以获得 ServiceInstances 了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
apiVersion: apps/v1
kind: Deployment
metadata:
name: netshoot
labels:
app: netshoot
spec:
replicas: 1
selector:
matchLabels:
app: netshoot
template:
metadata:
labels:
app: netshoot
spec:
containers:
- name: netshoot
image: cr-cn-beijing.volces.com/ams-tools/netshoot:latest
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: netshoot
labels:
app: netshoot
spec:
ports:
- name: http-echo
port: 80
selector:
app: netshoot

结论就是,
如果希望一个没有 Service 的实例开启地域感知,需要在实例配置 istio-locality 标签
如果一个有 Service 的实例就不需要了

目的地址的地域信息

同样的打开 Envoy 控制台,看 Cluster 信息

1
2
3
4
5
outbound|80||echo-web.default.svc.cluster.local::172.16.128.16:80::weight::1
outbound|80||echo-web.default.svc.cluster.local::172.16.128.16:80::region::cn-beijing
outbound|80||echo-web.default.svc.cluster.local::172.16.128.16:80::zone::cn-beijing-a
outbound|80||echo-web.default.svc.cluster.local::172.16.128.16:80::sub_zone::
outbound|80||echo-web.default.svc.cluster.local::172.16.128.16:80::canary::false

这里后面有 region 值就是正确的

Istio 是如何获得这个值的

这部分的值是从 POD 所在的 Node 自动获取的,但是对于 Workentry 需要手动配置,因为没有Node信息。

POD

https://github.com/istio/istio/blob/9e0d31bd287d28465b6cdfe2b9cc1e2711b3cd78/pilot/pkg/serviceregistry/kube/controller/controller.go#L892-L918

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// getPodLocality retrieves the locality for a pod.
func (c *Controller) getPodLocality(pod *v1.Pod) string {
// if pod has `istio-locality` label, skip below ops
if len(pod.Labels[model.LocalityLabel]) > 0 {
return model.GetLocalityLabelOrDefault(pod.Labels[model.LocalityLabel], "")
}

// NodeName is set by the scheduler after the pod is created
// https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#late-initialization
node, err := c.nodeLister.Get(pod.Spec.NodeName)
if err != nil {
if pod.Spec.NodeName != "" {
log.Warnf("unable to get node %q for pod %q/%q: %v", pod.Spec.NodeName, pod.Namespace, pod.Name, err)
}
return ""
}

region := getLabelValue(node.ObjectMeta, NodeRegionLabelGA, NodeRegionLabel)
zone := getLabelValue(node.ObjectMeta, NodeZoneLabelGA, NodeZoneLabel)
subzone := getLabelValue(node.ObjectMeta, label.TopologySubzone.Name, "")

if region == "" && zone == "" && subzone == "" {
return ""
}

return region + "/" + zone + "/" + subzone // Format: "%s/%s/%s"
}
WorkloadEntry

直接使用配置中的 Locality 字段来获取的。
https://github.com/istio/istio/blob/9e0d31bd287d28465b6cdfe2b9cc1e2711b3cd78/pilot/pkg/serviceregistry/serviceentry/conversion.go#L277-L299

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
labels := labelutil.AugmentLabels(wle.Labels, clusterID, wle.Locality, networkID)
return &model.ServiceInstance{
Endpoint: &model.IstioEndpoint{
Address: addr,
EndpointPort: instancePort,
ServicePortName: servicePort.Name,
Network: network.ID(wle.Network),
Locality: model.Locality{
Label: wle.Locality,
ClusterID: clusterID,
},
LbWeight: wle.Weight,
Labels: labels,
TLSMode: tlsMode,
ServiceAccount: sa,
// Workload entry config name is used as workload name, which will appear in metric label.
// After VM auto registry is introduced, workload group annotation should be used for workload name.
WorkloadName: configKey.name,
Namespace: configKey.namespace,
},
Service: service,
ServicePort: convertPort(servicePort),
}

配置区间路由

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: echo-web
spec:
host: echo-web.default.svc.cluster.local
trafficPolicy:
connectionPool:
http:
maxRequestsPerConnection: 1
loadBalancer:
localityLbSetting:
enabled: true
outlierDetection:
consecutive5xxErrors: 1
interval: 1s
baseEjectionTime: 1m

值得注意的是,配置同区域优先只需要配置 enabled: true 即可。
但是一定要搭配使用 outlierDetection

在 1.13.4 的istio 版本还有一个值得注意的地方。如果在 workload 中一开始没有配置 locality,但是已经配置了 DestinationRule,此时再配置 locality 就无效,需要重新配置 DestinationRule

数据面

在 Envoy 中,其实有两种感知模式 zone_aware_lb_config 和 locality_weighted_lb_config,分别对应了客户端感知和服务端感知,在 istio 中,并没有使用前者,都是用了后者(换言之,控制策略都在服务端),因此,在 istio 最终配置下来的的策略中,我们需要关注的是

1
2
3
4
5
6
7
8
9
10
11
outbound|80||echo-web.default.svc.cluster.local::172.16.128.66:80::region::cn-beijing
outbound|80||echo-web.default.svc.cluster.local::172.16.128.66:80::zone::cn-beijing-b
outbound|80||echo-web.default.svc.cluster.local::172.16.128.66:80::sub_zone::
outbound|80||echo-web.default.svc.cluster.local::172.16.128.66:80::canary::false
outbound|80||echo-web.default.svc.cluster.local::172.16.128.66:80::priority::1

outbound|80||echo-web.default.svc.cluster.local::172.16.128.17:80::region::cn-beijing
outbound|80||echo-web.default.svc.cluster.local::172.16.128.17:80::zone::cn-beijing-a
outbound|80||echo-web.default.svc.cluster.local::172.16.128.17:80::sub_zone::
outbound|80||echo-web.default.svc.cluster.local::172.16.128.17:80::canary::false
outbound|80||echo-web.default.svc.cluster.local::172.16.128.17:80::priority::0

看上面的两个 Cluster,权重高的是下面的,权重低的是上面。
在这里有个 testcare 可以关注下 https://github.com/istio/istio/blob/d3676a8318c5f1380b21ceb583d2d9016ec7c26b/pilot/pkg/networking/core/v1alpha3/loadbalancer/loadbalancer_test.go#L83-L107

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
t.Run("Failover: all priorities", func(t *testing.T) {
g := NewWithT(t)
env := buildEnvForClustersWithFailover()
cluster := buildFakeCluster()
ApplyLocalityLBSetting(cluster.LoadAssignment, nil, locality, nil, env.Mesh().LocalityLbSetting, true)
for _, localityEndpoint := range cluster.LoadAssignment.Endpoints {
if localityEndpoint.Locality.Region == locality.Region {
if localityEndpoint.Locality.Zone == locality.Zone {
if localityEndpoint.Locality.SubZone == locality.SubZone {
g.Expect(localityEndpoint.Priority).To(Equal(uint32(0)))
continue
}
g.Expect(localityEndpoint.Priority).To(Equal(uint32(1)))
continue
}
g.Expect(localityEndpoint.Priority).To(Equal(uint32(2)))
continue
}
if localityEndpoint.Locality.Region == "region2" {
g.Expect(localityEndpoint.Priority).To(Equal(uint32(3)))
} else {
g.Expect(localityEndpoint.Priority).To(Equal(uint32(4)))
}
}
})

生成的逻辑参考如下
https://github.com/istio/istio/blob/d3676a8318c5f1380b21ceb583d2d9016ec7c26b/pilot/pkg/xds/eds.go#L333-L371

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
func (s *DiscoveryServer) generateEndpoints(b EndpointBuilder) *endpoint.ClusterLoadAssignment {
llbOpts, err := s.llbEndpointAndOptionsForCluster(b)
if err != nil {
return buildEmptyClusterLoadAssignment(b.clusterName)
}

// Apply the Split Horizon EDS filter, if applicable.
llbOpts = b.EndpointsByNetworkFilter(llbOpts)

if model.IsDNSSrvSubsetKey(b.clusterName) {
// For the SNI-DNAT clusters, we are using AUTO_PASSTHROUGH gateway. AUTO_PASSTHROUGH is intended
// to passthrough mTLS requests. However, at the gateway we do not actually have any way to tell if the
// request is a valid mTLS request or not, since its passthrough TLS.
// To ensure we allow traffic only to mTLS endpoints, we filter out non-mTLS endpoints for these cluster types.
llbOpts = b.EndpointsWithMTLSFilter(llbOpts)
}
llbOpts = b.ApplyTunnelSetting(llbOpts, b.tunnelType)

l := b.createClusterLoadAssignment(llbOpts)

// If locality aware routing is enabled, prioritize endpoints or set their lb weight.
// Failover should only be enabled when there is an outlier detection, otherwise Envoy
// will never detect the hosts are unhealthy and redirect traffic.
enableFailover, lb := getOutlierDetectionAndLoadBalancerSettings(b.DestinationRule(), b.port, b.subsetName)
lbSetting := loadbalancer.GetLocalityLbSetting(b.push.Mesh.GetLocalityLbSetting(), lb.GetLocalityLbSetting())
if lbSetting != nil {
// Make a shallow copy of the cla as we are mutating the endpoints with priorities/weights relative to the calling proxy
l = util.CloneClusterLoadAssignment(l)
wrappedLocalityLbEndpoints := make([]*loadbalancer.WrappedLocalityLbEndpoints, len(llbOpts))
for i := range llbOpts {
wrappedLocalityLbEndpoints[i] = &loadbalancer.WrappedLocalityLbEndpoints{
IstioEndpoints: llbOpts[i].istioEndpoints,
LocalityLbEndpoints: l.Endpoints[i],
}
}
loadbalancer.ApplyLocalityLBSetting(l, wrappedLocalityLbEndpoints, b.locality, b.proxy.Metadata.Labels, lbSetting, enableFailover)
}
return l
}

因此最终生效,只需要看 Cluster 中的数据是否已经按照期望的优先级排序即可验证想法。