k8s 上 gRPC 负载不均衡的问题

前言

最近,运维的同事发现 k8s 环境上有些服务伸缩到多个 POD 后,业务流程还是只有一个 POD 在处理,并没有实现负载均衡。 后面,查阅资料,发现这两个两个服务之间使用的是 gRPC 协议,k8s service 负载均衡工作在 L4,无法识别 L7 的 HTTP/2 协议,会导致负载均衡不均匀的问题。

gRPC uses the performance boosted HTTP/2 protocol. One of the many ways HTTP/2 achieves lower latency than its predecessor is by leveraging a single long-lived TCP connection and to multiplex request/responses across it. This causes a problem for layer 4 (L4) load balancers as they operate at too low a level to be able to make routing decisions based on the type of traffic received. As such, an L4 load balancer, attempting to load balance HTTP/2 traffic, will open a single TCP connection and route all successive traffic to that same long-lived connection, in effect cancelling out the load balancing.

解决方法

k8s 环境下部署 grpc 的几种方案

解决这个问题有很多方法,结合该服务的 gRPC 本身是基于 ETCD 作为服务注册中心,并考虑部署和实施的可行性行,最终决定将之前使用服务名注册到 ETCD的方式改成使用 POD_IP:PORT 注册到 ETCD

  1. 设置容器的环境变量
    1
    2
    3
    4
    5
    6
    7
    8
    9
    env:
      - name: POD_NAME # pod name信息
     valueFrom:
       fieldRef:
         fieldPath: metadata.name
      - name: POD_IP # pod ip信息
     valueFrom:
       fieldRef:
         fieldPath: status.podIP
    
  2. 修改该服务读取配置文件的方式,使其可以读取到环境变量 {POD_IP}

可以参考 Golang 读取 yaml 配置文件