Day 8：集群内 CI 闭环 — Gitea + Jenkins + Kaniko

Day 7 装好 Harbor 镜像仓和 ArgoCD CD 通道，缺最后一段：source code → image。这一天搭一条完全跑在集群里的 CI 流水线，120 秒内从 git push 到 image 进 Harbor，整条链路不依赖任何 SaaS。

三段：

A. Gitea —— 集群内 git server，PVC 落盘，NodePort 暴露，API 创 repo。
B. Jenkins + Kaniko —— Jenkins controller + 动态 K8s agent，Kaniko 容器化 build（无 docker daemon、无 privileged）。
C. 端到端 pipeline —— Jenkinsfile 跑通 git clone → kaniko build → push Harbor。

总体架构

开发者 git push
   ↓
Gitea (Longhorn PVC, NodePort 30022)
   ├─ notes-app     ← 业务源码 + Dockerfile + Jenkinsfile
   └─ notes-deploy  ← K8s manifests
   ↓ webhook
Jenkins (K8s plugin)
   └─ 每次 build 起一个 Pod 当 agent
       └─ Kaniko 容器 build image
   ↓ push
Harbor (Day 7) — image: bootcamp/notes-api:<git-sha>
   ↓ ArgoCD watch notes-deploy
K8s 集群 — Pod 滚动更新

两个 repo 分开是经典 GitOps 代码 ≠ 部署：notes-app 变化频繁，notes-deploy 只放 K8s manifest，Jenkins build 完后 patch 它的 image tag，ArgoCD watch 它做 deploy，CI 和 CD 在 repo 边界上解耦。

A. Gitea：集群内 git server

A.1 helm 装 Gitea

helm repo add gitea-charts https://dl.gitea.com/charts
helm install gitea gitea-charts/gitea \
  --namespace gitea --create-namespace \
  --set service.http.type=NodePort --set service.http.nodePort=30022 \
  --set persistence.storageClass=longhorn --set persistence.size=5Gi \
  --set gitea.admin.username=bootcamp --set gitea.admin.password=bootcamp \
  --set postgresql.enabled=true --set valkey.enabled=true

装出来 3 Pod：gitea-0（Go 主进程 + git daemon + Web UI）/ gitea-postgresql-0（元数据库）/ gitea-valkey-*（session cache，valkey 是 redis fork）。Longhorn PVC 保证 git repo 落盘跨节点可挂；CI 流量是集群内部 Jenkins → Gitea，NodePort 给外部 git push 留口子就够，不需要 Ingress。

A.2 真坑：Kyverno enforce 拦截 chart 默认 Deployment

helm install 立刻报 require-cpu-mem fail: Container must set resources.requests.cpu/memory。Day 5 装的 Kyverno 强制 resources.requests，但 Gitea chart 默认不设 resources（chart 作者把这个决定留给用户）。

修法：

# 路 1：enforce 改 audit（学习场景）
for p in require-resources disallow-latest-tag require-app-label; do
  kubectl patch clusterpolicy $p --type=merge \
    -p '{"spec":{"validationFailureAction":"Audit"}}'
done

# 路 2：给系统级 ns 写 PolicyException（生产姿势）

教训：装 Kyverno 第一天就 enforce 是典型新手错误。生产姿势：先 audit 1-2 周收集违规 → 改业务 chart 加 resources → 再切 enforce，并对系统 ns 写 exception。所有 admission controller（Kyverno / OPA Gatekeeper / VAP）同理。

A.3 API 创 repo

GUI 点 web 太慢，直接 API：

for repo in notes-app notes-deploy; do
  curl -u bootcamp:bootcamp "http://10.0.24.28:30022/api/v1/user/repos" \
    -X POST -H 'Content-Type: application/json' \
    -d "{\"name\":\"$repo\",\"private\":false,\"auto_init\":true,\"default_branch\":\"main\"}"
done

auto_init: true 自动初始 commit + README，避免「empty repo can't be cloned」的尴尬。

A.4 SSH 端口冲突

chart 默认想暴露 SSH 22 给 git push。节点 sshd 也占 22，LoadBalancer 或 hostNetwork 会撞；NodePort 映射到高位端口避免冲突，但 git client 默认走 22，要么 ssh -p 要么改 ~/.ssh/config。学习场景 HTTP basic auth 够用（Jenkins 反正要存 credential，HTTP token 比 SSH key 简单），本篇全程 HTTP。

B. Jenkins + Kaniko：controller + 动态 agent

B.1 helm 装 Jenkins LTS

helm repo add jenkins https://charts.jenkins.io
helm install jenkins jenkins/jenkins \
  --namespace jenkins --create-namespace \
  --set controller.serviceType=NodePort --set controller.nodePort=30808 \
  --set controller.admin.username=admin --set controller.admin.password=bootcamp \
  --set persistence.storageClass=longhorn --set persistence.size=10Gi \
  --set 'controller.tolerations[0].operator=Exists'

装出来 jenkins-0（2 container：controller + config-reload sidecar 监听 JCasC ConfigMap），没有 static agent，所有 build 用 K8s plugin 动态起 Pod。10Gi PVC 存 build history / artifact / credential 加密 key（生产 50Gi+）。tolerations: Exists 学习场景容忍任何 taint，生产精确指定 key。

B.2 真坑：别 pin Jenkins plugin 版本

第一次 install 手痒加了 --set controller.installPlugins[0]=kubernetes:4329.v...：

Plugin kubernetes:4329 has unresolvable dependencies:
  Plugin git:5.7.0 depends on configuration-as-code:2036.v...,
  but there is an older version defined - configuration-as-code:1932...

Jenkins plugin 之间有复杂依赖链，chart 的 default 是经过测试的组合，手动 pin 一个会破坏整张图。

修法：不 pin，让 chart 用 default。生产真要 pin（reproducible build）就整组都 pin（复制 chart values 里 controller.installPlugins 全集列表冻住），不要只 pin 一个。

B.3 Harbor docker-registry Secret

Kaniko push 到 Harbor 需要 credential。K8s 标准的 docker-registry 类型 Secret 生成 .dockerconfigjson key，正好是 docker / containerd / kaniko 都认的 config 格式。

kubectl create secret docker-registry harbor-auth \
  --namespace jenkins \
  --docker-server=10.0.24.28:30002 \
  --docker-username=admin \
  --docker-password=bootcamp

挂到 Kaniko 容器的 /kaniko/.docker/config.json（kaniko hardcode 找这个路径）：

volumes:
- name: docker-config
  secret:
    secretName: harbor-auth
    items:
    - key: .dockerconfigjson
      path: config.json     # 必须 path=config.json

items.path 不能省 —— Secret key 是 .dockerconfigjson（带前缀 dot），Kaniko 找的文件名是 config.json，漏了这行 push 会报 UNAUTHORIZED。

B.4 Kaniko vs DinD：为什么不要 docker daemon

经典 CI 里 docker build 需要 docker daemon，K8s 里通常起 DinD（Docker-in-Docker） sidecar。DinD 3 个致命问题：需要 privileged: true（逃逸风险）、共享内核 cgroup namespace 隔离弱、Pod 销毁 layer cache 全丢、overlay-on-overlay I/O 损耗 30%+。

Kaniko 核心：完全用户态做 image build。读 Dockerfile → 拉 base image → 在容器 rootfs 上跑 RUN → snapshot 文件系统 diff 当 layer → 直接 push。没有 docker daemon、不需要 /var/run/docker.sock、不需要 privileged。代价：每条 RUN 默认做 full-fs snapshot，慢，缓存配置是关键（C.4 详述）。

B.5 RBAC：Jenkins SA 要能 create Pod + read Secret

K8s plugin 工作流：Jenkins controller 通过 SA 调 apiserver create agent Pod，build 完 delete。SA 至少要：

- apiGroups: [""]
  resources: ["pods", "pods/exec", "pods/log"]
  verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]   # 挂 harbor-auth 时 kubelet 要 SA 能读这 Secret

chart 默认建好了这 Role，但手动改 SA 或换 ns 跑 agent少这条权限就会 MountVolume.SetUp failed ... secrets "harbor-auth" is forbidden。排查：kubectl describe pod <agent-pod> 看 events，Mount 失败是 SA 权限不够最典型的症状。

C. 端到端 pipeline：git → kaniko → Harbor

C.1 Jenkinsfile（Kaniko Pod template）

提交到 notes-app 根目录：

pipeline {
  agent {
    kubernetes {
      yaml '''
apiVersion: v1
kind: Pod
spec:
  tolerations: [{operator: Exists}]
  containers:
  - name: kaniko
    image: gcr.io/kaniko-project/executor:debug
    command: [/busybox/cat]
    tty: true
    volumeMounts:
    - {name: docker-config, mountPath: /kaniko/.docker}
    resources:
      requests: {cpu: 100m, memory: 256Mi}
      limits:   {cpu: 1,    memory: 1Gi}
  volumes:
  - name: docker-config
    secret:
      secretName: harbor-auth
      items: [{key: .dockerconfigjson, path: config.json}]
'''
    }
  }
  stages {
    stage('Build and Push') {
      steps {
        container('kaniko') {
          sh '''
            GIT_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo dev)
            /kaniko/executor \\
              --dockerfile=Dockerfile --context=$(pwd) \\
              --destination=10.0.24.28:30002/bootcamp/hello-kaniko:${GIT_SHA} \\
              --destination=10.0.24.28:30002/bootcamp/hello-kaniko:latest \\
              --insecure --skip-tls-verify
          '''
        }
      }
    }
  }
}

配套 Dockerfile：

FROM alpine:3.20
RUN echo "Hello from Kaniko" > /hello.txt
CMD ["cat", "/hello.txt"]

细节：

command: [/busybox/cat] + tty: true —— K8s plugin 要求 agent container 必须 keep running，:debug 镜像带 busybox shell，用 cat 当 keep-alive。必须 :debug 不是 :latest（latest 是 distroless 没 shell）。
resources.requests / limits 必填 —— Kyverno policy 卡过一次，agent Pod 也是 Pod。
两个 --destination 同时打 :sha 和 :latest —— :sha 给 ArgoCD pin 精确版本，生产只用 :sha。

脚本批量创 job 直接 curl -u admin:bootcamp -X POST .../createItem 报 403 No valid crumb。Jenkins 默认开 CSRF protection，所有 POST 要 Jenkins-Crumb header，且 crumb 跟 session 绑定。curl 默认每次新 session → crumb 立即失效。

修法：cookie jar 维持 session。

COOKIE=/tmp/jenkins-cookie

# 1. 拿 crumb，cookie 写进 jar
CRUMB=$(curl -c $COOKIE -b $COOKIE -u admin:bootcamp \
  "http://.../crumbIssuer/api/xml?xpath=concat(//crumbRequestField,%22:%22,//crumb)")

# 2. 创 Job，同一个 cookie jar
curl -c $COOKIE -b $COOKIE -u admin:bootcamp \
  -X POST -H "$CRUMB" -H "Content-Type:application/xml" \
  --data-binary @/tmp/job.xml "http://.../createItem?name=hello-kaniko"

生产自动化别用 curl，用 jenkins-cli.jar 或 python-jenkins library，crumb + session 处理封装好了。

C.3 SCM pipeline 指向 Gitea

Job XML 关键片段：

<scm class="hudson.plugins.git.GitSCM">
  <userRemoteConfigs>
    <hudson.plugins.git.UserRemoteConfig>
      <url>http://bootcamp:bootcamp@10.0.24.28:30022/bootcamp/notes-app.git</url>
    </hudson.plugins.git.UserRemoteConfig>
  </userRemoteConfigs>
  <branches><hudson.plugins.git.BranchSpec><name>*/main</name></hudson.plugins.git.BranchSpec></branches>
</scm>
<scriptPath>Jenkinsfile</scriptPath>

URL 内嵌密码（bootcamp:bootcamp@...）是学习场景偷懒，生产必须 Jenkins Credentials：提前把 Gitea API token 存到 credentials store，用 <credentialsId>gitea-token</credentialsId> 引用，绝不写明文。

C.4 Kaniko cache 策略

默认每次都拉 base image + 跑全部 RUN，慢。K8s 场景必选 --cache-repo 不选 --cache-dir：agent Pod ephemeral，本地 dir 销毁就没了，必须 push 到独立 repo 共享。

/kaniko/executor \
  --cache=true \
  --cache-repo=10.0.24.28:30002/bootcamp/kaniko-cache \
  --cache-ttl=168h \
  --dockerfile=Dockerfile --context=$(pwd) \
  --destination=...

--cache-ttl=168h（7 天）防 cache 无限增长。多阶段构建（FROM golang AS builder + FROM alpine）的缓存收益最大 —— builder stage 几百 MB Go module download / compile 可以完全命中，实测首次 180s、改 code 重 build 40s、不改 code 重 build 8s。

C.5 完整 build trace + Harbor 验证

首次 build 120s 时间分解：

阶段	耗时	备注
Pod 调度 + 拉 kaniko:debug	0-60s	首次最慢，节点有 cache 后 < 5s
Pod ready + jnlp agent connect	60-90s	K8s plugin 协议握手
git clone	90-95s	内网 Gitea 很快
Kaniko 拉 alpine + build	95-99s	base image 小
push `:sha` + `:latest`	99-129s	16s × 2，layer 复用但 manifest 重写

Harbor 验证：

curl -u admin:bootcamp http://10.0.24.28:30002/api/v2.0/projects/bootcamp/repositories | \
  jq '.[] | {name, artifact_count}'
# {"name":"bootcamp/hello-kaniko","artifact_count":1}

完整链路通：Gitea push → Jenkins SCM trigger → K8s Pod 起 Kaniko → build → push Harbor。

典型坑速查

现象	根因	修复
helm install gitea 被 admission 拒	Kyverno enforce 拦 chart 默认 Deployment	改 audit 或写 PolicyException
Jenkins plugin 依赖冲突	手动 pin 单个 plugin 破坏依赖图	不 pin，用 chart default
Kaniko push 报 UNAUTHORIZED	Secret 挂进去但文件名不是 `config.json`	volume items.path 改名
Agent Pod FailedMount Secret	Jenkins SA 没 `secrets get`	RBAC 加 secrets:get
Jenkins API POST 403 No valid crumb	CSRF + curl 每次新 session	cookie jar 维持 session
Kaniko build 每次跑全量 RUN	没开 cache	`--cache --cache-repo=<harbor>/kaniko-cache`
Gitea SSH 走 22 跟节点 sshd 撞	chart 默认想暴露 22	NodePort 高位 + `ssh -p`

面试常见题

Q1：CI 引擎选型：Jenkins vs Tekton vs Argo Workflows vs Gitea Actions？

	模型	强项	弱项
Jenkins	controller + agent，Groovy DSL	插件生态最丰富、企业接受度高	JVM 重、Groovy 学习成本、controller 单点
Tekton	K8s CRD	cloud-native、每 step 一 Pod、CRD 可 GitOps	无原生 UI、调试不直观
Argo Workflows	K8s CRD，DAG	DAG 表达力强、跟 Argo CD 同生态、ML/批处理友好	不专注 CI，SCM 集成弱
Gitea Actions	复用 GHA YAML	跟 Gitea repo 同库零接入，复用 GHA marketplace	runner 不是 K8s 原生

选型逻辑：已有 Jenkins 经验 → Jenkins；cloud-native first → Tekton；跟 Argo CD 强配套 → Argo Workflows；要 GHA 体验自托管 → Gitea Actions。本篇选 Jenkins 是面试性价比最高，新项目我会选 Tekton 或 Argo Workflows。

Q2：Kaniko 怎么做到不要 docker daemon 也能 build image？

核心：完全用户态做 image build，不调 /var/run/docker.sock。流程：解析 Dockerfile → 用 go-containerregistry 拉 base image → 把 rootfs unpack 到容器自己的 / → 每条 RUN 在容器内 exec，跑完做 snapshot() 算 file diff 打 tar 当 layer → push manifest + layers 到 registry。

代价：每 RUN 后 full-fs scan 慢，可改 --snapshot-mode=redo 只算 changed files。Dockerfile 里 rm -rf / 会真删 Kaniko 自己（Pod 报废，没逃逸）。

对比：BuildKit rootless 用 user namespace + fuse-overlayfs 更快、可并行 stage，但配置复杂。K8s in-cluster build 场景 Kaniko 是默认选择，零特权 + 配置简单。

Q3：CI pipeline 安全：secret 管理 + Kaniko RBAC 怎么做？

3 层防护。

Secret 管理：绝不在 Jenkinsfile / Dockerfile 写明文 token，用 Jenkins Credentials 或 K8s Secret。Harbor / Gitea 用 robot account / API token 不用 admin。进阶接 Vault + Vault Secret Operator 做 rotate。镜像里不准 bake secret（COPY / ARG → ENV 都不行），用 --build-arg 在 build 时注入但不 commit 到 layer。

Kaniko Pod 权限：不要 privileged（给了就白选 Kaniko 了）、不要 mount docker.sock、SA 最小权限（能读 docker-config Secret 就够）、ResourceQuota + LimitRange 防恶意 Jenkinsfile 跑挖矿。

供应链：build 完用 cosign / notation 签名，ArgoCD 拉之前用 Kyverno / Connaisseur 验签，Harbor 开 Trivy scan 拦 CVE。

Q4：自建 Gitea vs GitHub / GitLab 成本对比？

维度	自建 Gitea	GitHub Cloud	自建 GitLab
资源	3 Pod / 5Gi / 500MB RAM	0	8 vCPU / 16G / 50G+
月成本（50 人）	几十元节点摊销	$44/u × 50 ≈ $2200	几百元
数据主权	完全自有	美国法律约束	完全自有
CI/CD	需自接	GitHub Actions（强）	GitLab CI（强）
HA / 备份	自己运维	平台保证	自己运维

Gitea 甜点：30 人以下 + 想自己掌控 + 不想运维 GitLab，运维复杂度比 GitLab 低一个数量级。适用场景：金融 / 政府 / AI 训练数据敏感、网络受限、学习实验。30+ 人需要复杂 review / code search 还是 GitHub，要现成 runner pool 还是 SaaS。

Q5：Kaniko 多阶段构建的 cache 策略？

3 个要点：

用 --cache-repo 不用 --cache-dir —— agent Pod ephemeral，本地 dir 销毁就没了，必须 push 到独立 repo。
layer 顺序 —— 不变的放前面：COPY go.mod go.sum + RUN go mod download 放在 COPY . . 之前，code 变化不 invalidate go module 这层（最贵的）；apt-get install 放最前。
跨 stage 不共享 cache —— builder stage 几百 MB go module / compile 是大头，alpine stage 只是 COPY 几 MB；每 stage 独立 cache 没问题。

实测 Go 应用：无 cache 180s / 改 code 重 build 40s / 不改 code 重 build 8s，go mod download 命中节省 80%+。进阶用 BuildKit 可并行 stage + 细粒度 cache mount，但要 daemon 或 rootless 配置；Kaniko 是「够用且简单」的平衡点。

下一步

CI 基础设施就位：Gitea 装 repo，Jenkins 调度 build，Kaniko 出 image，Harbor 收 image，ArgoCD 接管 deploy，整条链路自包含在集群里，120 秒完成一次 commit-to-image。Day 9-11 在这条流水线上跑真实业务：notes-api（Go gin）+ notes-ui（Vue3 nginx）+ MySQL StatefulSet，每个组件走完整的 Gitea push → Jenkins build → Harbor → ArgoCD → Pod ready 链路，并接 Day 4/6 的 Prometheus / Loki / Hubble 做可观测验证。