RBAC 与安全模型：认证、授权与准入控制

密钥只有在有效的访问控制下才有意义。Kubernetes 的安全控制分三层：认证（谁在访问）、授权（能做什么）、准入控制（允许什么操作）。三层各有职责，缺一不可。

在单体应用时代，安全通常集中在网络边界和数据库访问控制。容器化和微服务之后，工作负载数量激增，身份变得复杂：一个 Pod 就是一个独立的执行主体，可能需要读取 Secret、调用其他服务、操作集群资源。Kubernetes 的安全模型由哪三个层次组成、RBAC 的 Role/RoleBinding/ClusterRole/ClusterRoleBinding 如何组合使用，是本篇的核心内容。

三层安全模型

请求到达 API Server
     │
     ▼
┌──────────────────────────────────┐
│  1. Authentication（谁？）        │
│  ────────────────────────────    │
│  X.509 cert   → User / Group     │
│  Bearer Token → ServiceAccount   │
│  OIDC Token   → 外部身份          │
│  Webhook Token → 自定义身份验证   │
└──────────────────┬───────────────┘
                   │ 验证通过，得到 Subject
                   ▼
┌──────────────────────────────────┐
│  2. Authorization（能做什么？）   │
│  ────────────────────────────    │
│  RBAC：Subject + Verb + Resource  │
│  → Allow / Deny                  │
│  Node Authorizer（kubelet 专用）  │
│  ABAC（已基本废弃）               │
└──────────────────┬───────────────┘
                   │ 授权通过
                   ▼
┌──────────────────────────────────┐
│  3. Admission Control（允许吗？） │
│  ────────────────────────────    │
│  MutatingWebhook  → 修改请求      │
│  ValidatingWebhook → 拒绝请求     │
│  Pod Security Standards          │
│  ResourceQuota / LimitRange      │
└──────────────────┬───────────────┘
                   │ 全部通过
                   ▼
              写入 etcd

认证（Authentication）

X.509 客户端证书

kubectl 使用的默认认证方式。kubeconfig 中包含客户端证书和私钥，API Server 验证证书由集群 CA 签发：

证书的 Subject CN（Common Name）对应 Kubernetes 用户名
证书的 Subject O（Organization）对应 Kubernetes 用户组

kubectl config view --raw 可以查看当前 kubeconfig，openssl x509 -in cert.crt -noout -subject 查看证书主体。

ServiceAccount Token

Pod 内部的默认认证方式。自 Kubernetes 1.22，默认使用 Projected Token（有过期时间的 JWT），替代了之前的永久 Secret-based Token：

/var/run/secrets/kubernetes.io/serviceaccount/
├── token      ← JWT，包含 namespace/serviceaccount 信息，默认 1 小时过期
├── ca.crt     ← 集群 CA 证书
└── namespace  ← 当前命名空间

kubelet 自动在过期前刷新 token，应用每次读取文件即可得到最新 token。token 的 sub 字段格式为 system:serviceaccount:{namespace}:{name}。

OpenID Connect（OIDC）

对接企业 IdP（Okta、Keycloak、Google、Azure AD），实现统一身份管理：

API Server 配置参数：

--oidc-issuer-url=https://accounts.google.com
--oidc-client-id=kubernetes
--oidc-username-claim=email
--oidc-groups-claim=groups

用户通过 IdP 登录获取 ID Token（JWT），kubectl 附带该 token 发请求，API Server 验证 JWT 签名（通过 OIDC Discovery 获取公钥），从 claims 中提取用户名和用户组。

RBAC 四个资源

Role 与 ClusterRole

Role 定义命名空间内的权限集合，ClusterRole 定义集群级别或非命名空间资源的权限：

# 命名空间级别：只能读取 pods
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: production
rules:
- apiGroups: [""]           # "" 表示 core API group
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]

# 集群级别：可以读取所有命名空间的 pods，以及集群级资源 nodes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["nodes"]      # nodes 是集群级别资源
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch"]

verbs 的完整集合：get list watch create update patch delete deletecollection。特殊 verb * 表示所有操作，但应避免在生产环境使用。

RoleBinding 与 ClusterRoleBinding

RoleBinding 把 Role 或 ClusterRole 绑定到 Subject，限于一个命名空间：

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-reader-binding
  namespace: production
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
  name: monitoring-agent
  namespace: monitoring    # ServiceAccount 的 namespace 可以与 RoleBinding 不同
- kind: Group
  name: system:masters
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole        # RoleBinding 可以引用 ClusterRole，但权限只在当前 namespace 生效
  name: cluster-pod-reader
  apiGroup: rbac.authorization.k8s.io

ClusterRoleBinding 把 ClusterRole 绑定到 Subject，集群范围生效：

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-binding
subjects:
- kind: User
  name: admin@company.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

组合规则

绑定类型	roleRef 类型	生效范围
RoleBinding	Role	同一命名空间
RoleBinding	ClusterRole	同一命名空间（权限收窄到 namespace）
ClusterRoleBinding	ClusterRole	所有命名空间 + 集群级资源
ClusterRoleBinding	Role	不允许（ClusterRoleBinding 只能绑定 ClusterRole）

RoleBinding 引用 ClusterRole 是常见模式：定义一次 ClusterRole（如 pod-reader），在每个需要的命名空间创建 RoleBinding，避免重复定义 Role。

Admission Control

Pod Security Standards（PSS）

PSS 在 1.25 GA，替代了已废弃的 PodSecurityPolicy。三个级别通过命名空间 label 配置：

# Baseline：防止常见权限提升
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/warn=restricted

# Restricted：最严格
kubectl label namespace sandboxed \
  pod-security.kubernetes.io/enforce=restricted

Restricted 级别要求：不允许 privileged、不允许 hostNetwork/hostPID、必须以非 root 用户运行、必须设置 seccompProfile、不允许挂载宿主机目录等。

Admission Webhook

ValidatingWebhook 可以强制执行自定义安全策略，例如 OPA Gatekeeper：

# Gatekeeper ConstraintTemplate：要求所有 Deployment 设置 resource limits
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlimits
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLimits
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8srequiredlimits
      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        not container.resources.limits.memory
        msg := sprintf("Container %v must have memory limits", [container.name])
      }

实验：RBAC 权限配置与验证

# 查看当前用户权限
kubectl auth can-i get pods
kubectl auth can-i create deployments --namespace=production
kubectl auth can-i "*" "*"    # 是否有超级权限

# 查看 ServiceAccount 默认权限
kubectl auth can-i get pods \
  --as=system:serviceaccount:default:default
# no（默认 ServiceAccount 权限很少）

# 创建最小权限 ServiceAccount
kubectl create serviceaccount pod-reader -n production
kubectl create role pod-reader-role \
  --verb=get,list,watch \
  --resource=pods \
  -n production
kubectl create rolebinding pod-reader-binding \
  --role=pod-reader-role \
  --serviceaccount=production:pod-reader \
  -n production

# 验证权限
kubectl auth can-i get pods \
  --as=system:serviceaccount:production:pod-reader \
  -n production
# yes

kubectl auth can-i delete pods \
  --as=system:serviceaccount:production:pod-reader \
  -n production
# no

kubectl auth can-i get secrets \
  --as=system:serviceaccount:production:pod-reader \
  -n production
# no

# 查看集群中的 cluster-admin 绑定（生产中应最小化）
kubectl get clusterrolebindings \
  -o custom-columns='NAME:.metadata.name,SUBJECTS:.subjects[*].name' \
  | grep -v "^system:"

# 审计：哪些 ServiceAccount 有 secrets 读取权限
kubectl get rolebindings,clusterrolebindings \
  -A -o json | \
  python3 -c "
import json,sys
data=json.load(sys.stdin)
for item in data.get('items',[]):
    rules=[]
    ref=item.get('roleRef',{})
    if 'secrets' in str(item):
        print(item['metadata']['name'], ref.get('name',''))
"

ServiceAccount 与 Pod 身份

每个 Pod 在创建时被自动关联一个 ServiceAccount（默认是 default）。kubelet 为 Pod 准备一个 Projected Volume，包含 ServiceAccount Token、CA 证书、命名空间三个文件：

spec:
  serviceAccountName: my-app    # 指定专用 SA，不用 default
  automountServiceAccountToken: false  # 不需要访问 API 时禁用自动挂载
  volumes:
  - name: kube-api-access
    projected:
      sources:
      - serviceAccountToken:
          expirationSeconds: 3600
          path: token
      - configMap:
          name: kube-root-ca.crt
          items:
          - key: ca.crt
            path: ca.crt
      - downwardAPI:
          items:
          - path: namespace
            fieldRef:
              fieldPath: metadata.namespace

应用通过读取 /var/run/secrets/kubernetes.io/serviceaccount/token 获取 token，用 Bearer Token 方式调用 API Server。controller-runtime 和 client-go 等客户端库会自动处理 token 读取和刷新。

审计日志

Kubernetes Audit Log 记录所有经过 API Server 的请求，是安全事件溯源的关键：

# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# 记录 Secret 的所有写操作（RequestResponse 级别，包含请求体和响应体）
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets"]
  verbs: ["create", "update", "patch", "delete"]
# 记录 Pod exec 操作
- level: Request
  resources:
  - group: ""
    resources: ["pods/exec", "pods/attach"]
# 其他请求只记录 Metadata（用户、时间、资源路径）
- level: Metadata
  omitStages: ["RequestReceived"]

Audit Log 的四个级别：None（不记录）、Metadata（只记录元数据）、Request（记录请求体）、RequestResponse（记录请求和响应体）。生产环境至少对 Secret 和特权操作开启 RequestResponse 级别。

网络策略与 RBAC 的层次关系

RBAC 和 Network Policy 各守一道防线：

                 ┌─────────────────────────────┐
外部请求 ────▶   │  Ingress / LoadBalancer      │
                 └──────────────┬──────────────┘
                                │
                 ┌──────────────▼──────────────┐
kubectl 操作────▶│  API Server (RBAC 保护)      │
                 └──────────────┬──────────────┘
                                │ 控制平面
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─
                                │ 数据平面
                 ┌──────────────▼──────────────┐
                 │  Pod A  ←──Network Policy──▶ Pod B  │
                 └─────────────────────────────┘

Network Policy 示例：只允许同一命名空间的 Pod 访问数据库：

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: backend    # 只允许 backend 标签的 Pod 访问
    ports:
    - protocol: TCP
      port: 5432

Network Policy 只有在 CNI 插件支持时才生效（Calico、Cilium、Weave 支持；Flannel 默认不支持）。

映射到 Kubernetes 内部机制

API Server 的认证链是一个插件列表，按顺序尝试每个认证器，第一个成功的结果生效。认证成功后，得到 UserInfo（包含 Username、UID、Groups、Extra）。

RBAC Authorizer 收到请求后，构造 (Subject, Verb, Resource, Namespace) 四元组，遍历所有 RoleBinding/ClusterRoleBinding，找到匹配 Subject 的绑定，再检查对应 Role/ClusterRole 的 rules 是否覆盖该操作。RBAC 是纯 allow 模型（没有 deny），任意一条规则匹配即允许。

Admission Controller 分两个阶段：MutatingWebhook 先运行（可修改请求对象），ValidatingWebhook 后运行（只能允许或拒绝）。多个 Webhook 并行调用，任一返回拒绝则整个请求被拒绝。

模式提炼

Kubernetes 安全模型是"身份 → 权限 → 策略"三层防御。RBAC 只管 API 操作权限，Admission Webhook 管业务级安全策略，Network Policy 管运行时网络访问。三层都需要，缺哪层都不完整。

最小权限原则的实践：为每个工作负载创建专用 ServiceAccount，只授予必要权限；避免使用 default ServiceAccount；定期审计 ClusterRoleBinding，特别是 cluster-admin 绑定。

工程迁移表

安全机制	工程类比
ServiceAccount + RBAC	IAM Role + Policy（AWS），Spring Security GrantedAuthority
ClusterRole + ClusterRoleBinding	全局管理员角色，AWS AdministratorAccess
RoleBinding 引用 ClusterRole	角色模板复用，在多个环境应用同一权限定义
Admission Webhook	Spring AOP Interceptor，gRPC 服务端拦截器
Pod Security Standards	Docker Bench for Security，CIS Kubernetes Benchmark
OIDC 集成	OAuth2 SSO，企业 LDAP/AD 集成
OPA Gatekeeper	策略即代码，Rego 规则引擎

常见误解

误解一：ServiceAccount token 是永久有效的，泄露后危害持续。

自 Kubernetes 1.22 起，默认使用 Projected ServiceAccount Token，有 1 小时过期时间，kubelet 在过期前自动刷新。旧版本的 Secret-based Token（永久有效）可以通过设置 --service-account-max-token-expiration 限制，或者迁移到 Projected Token。即使 token 泄露，影响窗口最多 1 小时。但这不是不重视密钥保护的理由。

误解二：RBAC 可以阻止 Pod 间网络通信。

RBAC 只控制对 Kubernetes API 的操作（创建 Pod、读取 Secret 等），不控制 Pod 之间的网络流量。Pod 间网络隔离需要 Network Policy，由 CNI 插件在数据平面实现。两者是不同层次的控制：RBAC 是控制平面访问控制，Network Policy 是运行时网络访问控制，缺少任何一层都不完整。

误解三：cluster-admin ClusterRole 只有平台管理员才需要。

很多第三方工具（CI/CD 系统、监控组件、服务网格）在文档中要求 cluster-admin 权限，实际上可能只需要其中一小部分。生产环境应按照最小权限原则审查每个工具实际需要的权限，为其创建精确的 ClusterRole，而不是直接授予 cluster-admin。kubectl auth reconcile 可以帮助从 YAML 文件同步精确的权限配置。

练习

练习一：最小权限审计。找一个你熟悉的 Kubernetes 工具（Prometheus、ArgoCD 或 cert-manager），查看其安装文档要求的权限，对比实际 ClusterRole/Role 定义，评估是否遵循了最小权限原则。如果有多余权限，尝试移除并验证功能是否正常。

练习二：OIDC 集成实验。在 kind 集群中配置 Dex 作为 OIDC 提供方，修改 API Server 配置，通过 OIDC token 访问集群。理解 kubelogin（kubectl oidc-login 插件）的工作原理，体会企业 SSO 集成的完整流程。

练习三：Pod Security Standards 测试。创建一个命名空间，设置 enforce=restricted label，尝试部署不同安全配置的 Pod：特权容器（被拒绝）、以 root 运行（被拒绝）、符合 restricted 标准的非 root 容器（允许）。理解每个限制条件的安全含义。