安全准入 Runbook:RBAC / PSA / Kyverno / ResourceQuota
这篇讲“为什么我的 Pod 明明 YAML 对了,却被集群拒绝或一直起不来”。当前集群里有安全和准入策略,常见事件是:
PolicyViolation: Container must set resources.requests.cpu / memory
这不是 K8s 坏了,是准入策略在拦截不合规资源。
1. 组件作用
| 组件 / 机制 | 作用 |
|---|---|
| RBAC | 控制谁能访问哪些 K8s API |
| ServiceAccount | Pod 访问 apiserver 时使用的身份 |
| Pod Security Admission | K8s 内置 Pod 安全级别,限制 privileged、hostPath 等 |
| Kyverno | 准入策略引擎,用 YAML 写策略,拦不合规资源 |
| ResourceQuota | 限制 namespace 总资源 |
| LimitRange | 给容器默认 request/limit 或限制单 Pod 范围 |
| NetworkPolicy | 限制 Pod 间网络访问,当前由 Cilium 实现 |
为什么要这些组件:没有准入策略时,任何人都能提交无 request 的 Pod、privileged Pod、任意来源镜像。学习集群可以放宽,生产必须有边界。
2. 安装 Kyverno
Kyverno 用 manifest 安装即可。必须用 server-side apply:
kubectl apply --server-side --force-conflicts \
-f https://github.com/kyverno/kyverno/releases/download/v1.13.2/install.yaml
为什么必须 --server-side:Kyverno CRD 很大,普通 kubectl apply 会把整份 YAML 写进 last-applied-configuration annotation,可能超过 256KB 限制。Server-side apply 不写这个超大 annotation。
验收:
kubectl get pods -n kyverno
kubectl get crd | grep kyverno
kubectl get clusterpolicy
真实输出(v1.13.2,4 个控制器 + 9 个 CRD):
$ kubectl get pods -n kyverno
kyverno-admission-controller-… 1/1 Running ← webhook,策略生效靠它
kyverno-background-controller-… 1/1 Running ← 已存在资源的后台扫描
kyverno-cleanup-controller-… 1/1 Running
kyverno-reports-controller-… 1/1 Running ← 生成 policyreport
$ kubectl get crd | grep -c kyverno
9
⚠️
kyverno-admission-controller会先停在0/1 Running几十秒(实测,正常)。它是准入 webhook,要先初始化证书和缓存才 ready。策略必须等它1/1再应用,否则 webhook 没就绪、kubectl apply策略可能 timeout。等它 ready 再继续:kubectl wait --for=condition=Ready pod -n kyverno \ -l app.kubernetes.io/component=admission-controller --timeout=120s
3. 三条基础 ClusterPolicy
学习集群可以先用 Audit,确认不误伤后再改 Enforce。
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-app-label
spec:
validationFailureAction: Audit
rules:
- name: check-label
match:
any:
- resources:
kinds: [Deployment, StatefulSet]
validate:
message: "Resource must set label 'app.kubernetes.io/name'"
pattern:
metadata:
labels:
app.kubernetes.io/name: '?*'
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-latest-tag
spec:
validationFailureAction: Audit
rules:
- name: require-image-tag
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Image tag must not be 'latest' or empty"
pattern:
spec:
containers:
- image: "!*:latest & *:*"
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resources
spec:
validationFailureAction: Audit
rules:
- name: require-cpu-mem
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Container must set resources.requests.cpu / memory"
pattern:
spec:
containers:
- resources:
requests:
cpu: '?*'
memory: '?*'
保存成 kyverno-baseline.yaml 后执行:
kubectl apply -f kyverno-baseline.yaml
kubectl get clusterpolicy
切到强制拦截:
for p in require-resources disallow-latest-tag require-app-label; do
kubectl patch clusterpolicy $p --type=merge \
-p '{"spec":{"validationFailureAction":"Enforce"}}'
done
不要第一天就对全平台 Enforce。生产做法是先 Audit 1-2 周,修完 chart 后再 Enforce,并给系统 namespace 写例外。
策略应用后应是 READY=True:
$ kubectl get clusterpolicy
NAME ADMISSION BACKGROUND READY MESSAGE
disallow-latest-tag true true True Ready
require-app-label true true True Ready
require-resources true true True Ready
3.1 验证策略真的生效(Audit vs Enforce 实测)
光看到策略 Ready 不等于它真在拦——必须实测一把。建个测试 namespace,故意建违规 Pod(不写 resources):
kubectl create ns sec-test
kubectl run bad -n sec-test --image=nginx:1.27 # 没写 resources.requests
Audit 模式下 Pod 会被放行(pod/bad created),但后台生成审计报告:
$ kubectl get policyreport -n sec-test
NAME KIND NAME PASS FAIL WARN
de2d7a29… Pod bad 1 1 0 ← FAIL=1:违反了 require-resources,但没拦
切到 Enforce 再建同样的 Pod,这次会被直接拦截:
kubectl patch clusterpolicy require-resources --type=merge \
-p '{"spec":{"validationFailureAction":"Enforce"}}'
kubectl run bad2 -n sec-test --image=nginx:1.27
真实输出(这就是"YAML 没错却被拒"的样子):
Error from server: ... resource Pod/sec-test/bad2 was blocked due to the following policies
require-resources:
require-cpu-mem: 'validation error: Container must set resources.requests.cpu /
memory. rule require-cpu-mem failed at path /spec/containers/0/resources/requests/'
而合规 Pod(带 resources、镜像非 latest)在 Enforce 下照常通过:
kubectl run good -n sec-test --image=nginx:1.27 \
--overrides='{"spec":{"containers":[{"name":"good","image":"nginx:1.27","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}'
# pod/good created ✓
验证完清理 + 切回 Audit(避免长期 Enforce 误伤后续实验):
kubectl delete ns sec-test
kubectl patch clusterpolicy require-resources --type=merge \
-p '{"spec":{"validationFailureAction":"Audit"}}'
这套"违规被拦 / 合规放行 / 审计可查"就是准入控制的全部价值。新手遇到
was blocked due to the following policies不是 K8s 坏了,是策略在干活——按报错里的message改 YAML 即可。
4. 查看当前安全状态
kubectl get ns --show-labels
kubectl get serviceaccount -A
kubectl get role,rolebinding,clusterrole,clusterrolebinding -A | head -80
kubectl get resourcequota,limitrange -A
kubectl get networkpolicy -A
kubectl get pods -A | grep kyverno || true
kubectl get clusterpolicy,policy -A 2>/dev/null || true
kubectl get policyreport,clusterpolicyreport -A 2>/dev/null || true
如果 kubectl get clusterpolicy 能返回资源,说明 Kyverno CRD 已安装。
5. RBAC:看一个账号有没有权限
查看当前 kubeconfig 身份:
kubectl auth whoami
测试权限:
kubectl auth can-i get pods -A
kubectl auth can-i create pods -n default
kubectl auth can-i get secrets -n jenkins --as=system:serviceaccount:jenkins:jenkins
Jenkins agent 常见错误:
MountVolume.SetUp failed ... secrets "harbor-auth" is forbidden
含义:Jenkins agent 的 ServiceAccount 没权限读 Secret,或者 Secret 不在同 namespace。
排查:
kubectl get pod -n jenkins
kubectl describe pod -n jenkins <agent-pod>
kubectl get sa,role,rolebinding -n jenkins
kubectl auth can-i get secrets -n jenkins --as=system:serviceaccount:jenkins:jenkins
6. PSA:Pod 安全级别
查看 namespace 的 Pod Security 标签:
kubectl get ns --show-labels | grep pod-security
常见级别:
| 级别 | 含义 |
|---|---|
privileged | 基本不限制,适合系统组件 |
baseline | 禁止明显危险权限,适合大多数业务 |
restricted | 更严格,生产多租户常用 |
设置 namespace:
kubectl label ns app \
pod-security.kubernetes.io/enforce=baseline \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/warn=restricted \
--overwrite
如果业务 Pod 需要 hostNetwork、privileged、hostPath,不要直接给全 namespace 放开。先确认是否真的需要,再单独设计安全边界。
7. Kyverno:看策略为什么拦截
查看策略:
kubectl get clusterpolicy
kubectl describe clusterpolicy require-resources 2>/dev/null || true
查看策略报告:
kubectl get policyreport -A
kubectl get clusterpolicyreport -A
看最近事件:
kubectl get events -A --sort-by=.lastTimestamp | grep -i policy | tail -50
如果看到:
Container must set resources.requests.cpu / memory
Pod 或 Deployment 需要加:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
为什么要求 request:调度器根据 request 放置 Pod。没有 request 的 Pod 会让节点容量规划失真,容易把节点挤爆。
8. ResourceQuota 和 LimitRange
给业务 namespace 设置资源边界:
kubectl create ns app
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: app-quota
namespace: app
spec:
hard:
requests.cpu: "2"
requests.memory: 4Gi
limits.cpu: "4"
limits.memory: 8Gi
pods: "20"
---
apiVersion: v1
kind: LimitRange
metadata:
name: app-defaults
namespace: app
spec:
limits:
- type: Container
defaultRequest:
cpu: 100m
memory: 128Mi
default:
cpu: 500m
memory: 512Mi
EOF
查看:
kubectl describe quota -n app
kubectl describe limitrange -n app
常见事件:
| 事件 | 含义 |
|---|---|
exceeded quota | namespace 配额不够 |
must specify requests.cpu | LimitRange/Policy 要求 request |
maximum cpu usage per Container is ... | 单容器 limit 超过限制 |
9. NetworkPolicy
NetworkPolicy 是网络层准入。当前由 Cilium 执行。
查看:
kubectl get networkpolicy -A
kubectl get cnp,ccnp -A 2>/dev/null || true
排查被网络策略挡住:
hubble observe --verdict DROPPED --last 100
kubectl describe networkpolicy -n <ns> <policy>
最小 default-deny 示例:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: app
spec:
podSelector: {}
policyTypes:
- Ingress
加 default-deny 前必须先写允许规则,否则业务会被全部断开。
10. 准入失败排查顺序
kubectl apply 失败
-> 看错误消息
-> kubectl get events -A | tail
-> 查 Kyverno policyreport
-> 查 namespace PSA label
-> 查 RBAC can-i
Pod 创建成功但起不来
-> kubectl describe pod 看 Events
-> 查 ResourceQuota / LimitRange
-> 查 PVC / ImagePull / NetworkPolicy
常用命令:
kubectl describe pod -n <ns> <pod>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -50
kubectl get policyreport -n <ns> -o yaml
kubectl auth can-i --list -n <ns>