Kubernetes生产环境最佳实践大全

Kubernetes生产环境最佳实践

集群架构设计

高可用部署


# 推荐架构:3个Master节点 + N个Worker节点
# Master节点配置:4核CPU + 16GB内存 + 100GB SSD
# Worker节点配置:根据工作负载灵活配置

推荐的Etcd配置


# /etc/kubernetes/manifests/etcd.yaml
# 建议使用SSD存储
# 配置适当的快照间隔
# 启用定期备份

资源管理

合理设置资源限制


apiVersion: v1
kind: Pod
metadata:
  name: well-configured-pod
spec:
  containers:
  - name: app
    image: myapp:v1
    resources:
      requests:
        cpu: 200m
        memory: 256Mi
      limits:
        cpu: 1000m
        memory: 1Gi
    # 推荐:requests和limits比例不超过1:1.5

命名空间规划


production/     # 生产环境
  ├── frontend/
  ├── backend/
  ├── database/
  └── middleware/

staging/        # 预发布环境
development/    # 开发环境
monitoring/     # 监控组件
logging/        # 日志组件

镜像管理

使用合适的镜像标签


# 推荐:使用具体版本号
image: myapp:v1.2.3

# 避免:使用latest标签
# 避免:使用不稳定版本

镜像拉取策略


spec:
  containers:
  - name: app
    image: myapp:v1.2.3
    imagePullPolicy: Always  # 推荐:始终拉取最新

健壮性设计

健康检查


spec:
  containers:
  - name: app
    image: myapp:v1
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

Pod中断预算


apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: myapp

配置管理

使用ConfigMap和Secret


# 配置文件外部化管理
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  config.yaml: |
    database:
      host: mysql-service
      port: 3306
    cache:
      enabled: true
      ttl: 3600

敏感信息加密


apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
stringData:
  username: admin
  password: "${DB_PASSWORD}"  # 使用环境变量注入

网络安全

网络策略


# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress

TLS配置


apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.2 TLSv1.3"
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls

监控告警

关键指标监控


# 监控项
- Pod重启次数
- 内存/CPU使用率
- 磁盘I/O
- 网络流量
- 请求延迟(P99)
- 错误率(5xx)

告警规则


apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: critical-alerts
spec:
  groups:
  - name: critical
    rules:
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[5m]) > 0.3
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} 正在重启"

备份与恢复

Etcd备份


# 定时备份脚本
#!/bin/bash
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db

应用数据备份


# 使用Velero备份
velero backup create daily-backup \
  --include-namespaces production \
  --storage-location default \
  --ttl 720h

CI/CD集成

GitOps工作流


# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
spec:
  project: production
  source:
    repoURL: https://github.com/myorg/myapp-manifests
    targetRevision: main
    path: production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

运维规范

发布流程


1. 代码合并到main分支
2. 自动构建镜像并推送到仓库
3. 更新镜像tag(使用ArgoCD/Helm)
4. 观察监控指标
5. 如有问题快速回滚

变更管理


# 所有变更必须通过Git管理
# 使用kubectl apply --dry-run=client验证
# 使用kubectl diff查看变更
# 重要变更先在staging测试

常用工具


# k9s - Kubernetes终端UI
# kubectl plugin list
# kubectx - 快速切换上下文
# kubens - 快速切换命名空间
# helm - 包管理
# kustomize - 配置管理

发表回复

后才能评论