OpenStack生产环境部署完整指南:从规划到上线

OpenStack生产环境部署完整指南:从规划到上线

一、项目规划

1.1 需求分析

在开始部署前,需要明确以下需求:

业务需求:

  • 用户数量和并发需求
  • 虚拟机实例规模
  • 存储容量需求
  • 网络带宽需求
  • SLA要求(可用性、响应时间)
  • 技术需求:

  • OpenStack版本选择
  • 部署工具选择
  • 集成需求(容器、裸金属)
  • 多区域需求

1.2 规模评估

规模 Compute节点 VM数量 存储容量 网络带宽
小型 3-5 <100 10TB 1Gbps
中型 5-20 100-1000 50TB 10Gbps
大型 20-100 1000-10000 500TB 40Gbps
超大型 100+ 10000+ 1PB+ 100Gbps

1.3 硬件选型

Controller节点规格:

组件 最低配置 推荐配置
CPU 8核 16核
内存 32GB 64GB
系统盘 200GB SSD 500GB NVMe
数据盘 1TB SSD
网络 2x 1Gbps 4x 1Gbps + 2x 10Gbps

Compute节点规格:

组件 最低配置 推荐配置
CPU 8核 32核
内存 32GB 128GB
本地存储 500GB 4TB NVMe
网络 2x 1Gbps 2x 10Gbps

Storage节点规格:

组件 最低配置 推荐配置
CPU 8核 16核
内存 16GB 32GB
数据盘 10TB HDD 40TB HDD
网络 1Gbps 10Gbps

二、环境准备

2.1 系统准备


# 1. 安装Ubuntu Server 22.04
# 下载ISO并制作启动U盘

# 2. 系统配置
# 设置主机名
hostnamectl set-hostname controller01

# 配置hosts文件
cat >> /etc/hosts << 'EOF'
10.0.0.11 controller01
10.0.0.12 controller02
10.0.0.13 controller03
10.0.0.21 compute01
10.0.0.22 compute02
10.0.0.31 storage01
EOF

# 3. 配置NTP时间同步
timedatectl set-timezone Asia/Shanghai
apt install -y chrony

cat > /etc/chrony/chrony.conf << 'EOF'
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
allow 10.0.0.0/24
EOF

systemctl restart chrony

# 4. 配置网络
cat > /etc/netplan/01-netcfg.yaml << 'EOF'
network:
  version: 2
  renderer: networkd
  ethernets:
    ens3:
      addresses:
        - 10.0.0.11/24
      gateway4: 10.0.0.1
      nameservers:
        addresses:
          - 8.8.8.8
          - 8.8.4.4
    ens4:
      addresses:
        - 10.0.1.11/24
      optional: true
EOF

netplan apply

# 5. 更新系统
apt update && apt upgrade -y

# 6. 安装基础工具
apt install -y curl wget git vim htop iftop iotop sysstat

2.2 配置APT源


# 添加OpenStack Yoga仓库
apt install -y software-properties-common
add-apt-repository -y cloud-archive:yoga
apt update

# 或者使用阿里云镜像
cat > /etc/apt/sources.list.d/openstack-yoga.list << 'EOF'
deb http://mirrors.aliyun.com/openstack/yoga/ubuntu jammy main
deb http://mirrors.aliyun.com/openstack/yoga/ubuntu jammy-updates main
EOF

apt update

2.3 配置数据库


# 安装MariaDB Galera集群
apt install -y mariadb-server mariadb-client galera-3

# 配置Galera集群
cat > /etc/mysql/mariadb.conf.d/99-galera.cnf << 'EOF'
[mysqld]
binlog_format = ROW
default-storage-engine = InnoDB
innodb_autoinc_lock_mode = 2
bind-address = 0.0.0.0

[galera]
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_sMM.so
wsrep_cluster_name = "openstack-galera"
wsrep_cluster_address = "gcomm://controller01,controller02,controller03"
wsrep_node_name = "controller01"
wsrep_node_address = "10.0.0.11"
wsrep_slave_threads = 4
EOF

# 初始化集群(在第一个节点)
galera_new_cluster

# 在其他节点启动MySQL
systemctl start mysql

2.4 配置消息队列


# 安装RabbitMQ
apt install -y rabbitmq-server

# 配置RabbitMQ
rabbitmqctl add_user openstack RABBITMQ_PASS
rabbitmqctl set_permissions -p / openstack ".*" ".*" ".*"

# 启用管理插件
rabbitmq-plugins enable rabbitmq_management

# 配置集群
# 编辑/etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = classic_official_peers
cluster_formation.classic_official_peers.1 = rabbit@controller01
cluster_formation.classic_official_peers.2 = rabbit@controller02
cluster_formation.classic_official_peers.3 = rabbit@controller03

systemctl restart rabbitmq-server

2.5 配置Memcached


# 安装Memcached
apt install -y memcached libmemcached-tools

# 配置Memcached
cat > /etc/memcached.conf << 'EOF'
-m 2048
-p 11211
-u root
-l 0.0.0.0
-c 10240
-p /var/run/memcached/memcached.pid
EOF

systemctl restart memcached

# 配置防火墙(所有Controller节点)
ufw allow from 10.0.0.0/24 to any port 11211

三、Kolla-Ansible部署

3.1 安装Kolla-Ansible


# 安装依赖
apt install -y python3-pip python3-dev libffi-dev libssl-dev

# 安装Kolla-Ansible
pip3 install kolla-ansible==15.0.0

# 复制配置文件
sudo mkdir -p /etc/kolla
sudo chown $USER:$USER /etc/kolla

cp -r /usr/local/share/kolla-ansible/etc_examples/kolla/* /etc/kolla/
cp /usr/local/share/kolla-ansible/ansible/inventory/multinode /home/$USER/

# 安装Ansible
pip3 install ansible==6.4.0

3.2 配置Globals


# /etc/kolla/globals.yml

---
kolla_base_distro: "ubuntu"
kolla_install_type: "source"
openstack_release: "yoga"

kolla_internal_vip_address: "10.0.0.100"
kolla_external_vip_address: "10.0.0.100"

network_interface: "ens3"
neutron_external_interface: "ens4"

enable_haproxy: "yes"

# 启用服务
enable_horizon: "yes"
enable_glance: "yes"
enable_nova: "yes"
enable_neutron: "yes"
enable_cinder: "yes"
enable_keystone: "yes"
enable_heat: "yes"
enable_tacker: "no"
enable_magnum: "no"
enable_octavia: "no"

# 存储后端
enable_cinder_backend_lvm: "yes"
enable_cinder_backend_ceph: "no"

multinode: "yes"

# Docker配置
docker_registry: "docker.io"
docker_namespace: "kolla"

3.3 配置Passwords


# 生成密码
kolla-ansible -i multinode passwords

# 或手动配置
cat > /etc/kolla/passwords.yml << 'EOF'
keystone_admin_password: "YOUR_ADMIN_PASSWORD"
database_password: "YOUR_DB_PASSWORD"
rabbitmq_password: "YOUR_RABBITMQ_PASSWORD"
memcache_secret_key: "YOUR_MEMCACHE_KEY"
haproxy_password: "YOUR_HAPROXY_PASSWORD"
docker_registry_password: ""
EOF

3.4 配置Inventory


# /home/$USER/multinode

[control]
controller01 ansible_host=10.0.0.11 ansible_user=deploy
controller02 ansible_host=10.0.0.12 ansible_user=deploy
controller03 ansible_host=10.0.0.13 ansible_user=deploy

[network]
network01 ansible_host=10.0.0.14 ansible_user=deploy

[compute]
compute01 ansible_host=10.0.0.21 ansible_user=deploy
compute02 ansible_host=10.0.0.22 ansible_user=deploy

[storage]
storage01 ansible_host=10.0.0.31 ansible_user=deploy

[monitoring]
monitoring01 ansible_host=10.0.0.41 ansible_user=deploy

[deployment]
localhost ansible_connection=local

3.5 执行部署


# 1. 验证连接
ansible -i multinode all -m ping

# 2. 准备节点
kolla-ansible -i multinode bootstrap-servers

# 3. 预检查
kolla-ansible -i multinode prechecks -e "kolla_action=deploy"

# 4. 执行部署
kolla-ansible -i multinode deploy

# 5. 部署后检查
kolla-ansible -i multinode post-deploy

# 6. 验证部署
source /etc/kolla/admin-openrc.sh
openstack compute service list
openstack network agent list

四、验证部署

4.1 服务验证


# 检查服务状态
openstack compute service list
openstack network agent list
openstack volume service list

# 检查端点
openstack endpoint list

# 检查Nova Hypervisor
openstack hypervisor list

# 检查网络
openstack network list
openstack router list

# 检查存储
openstack volume list

4.2 创建测试资源


# 创建测试网络
openstack network create test-network
openstack subnet create --network test-network \
  --subnet-range 192.168.100.0/24 \
  test-subnet

# 创建测试路由器
openstack router create test-router
openstack router add subnet test-router test-subnet
openstack router set --external-gateway public test-router

# 创建测试虚拟机
openstack flavor create --public test-flavor --id auto \
  --ram 512 --disk 5 --vcpus 1

openstack image list
openstack keypair create test-key > test-key.pem

openstack server create --flavor test-flavor \
  --image cirros \
  --network test-network \
  --key-name test-key \
  test-vm

4.3 功能测试


# 测试网络连通性
openstack console log show test-vm --lines 20

# 获取VNC
openstack console url show test-vm

# 测试浮动IP
openstack floating ip create public
openstack server add floating ip test-vm 
ping -c 4 

# 测试SSH
chmod 600 test-key.pem
ssh -i test-key.pem cirros@

五、配置生产环境

5.1 配置高可用


# 启用HAProxy高可用
# Kolla-Ansible已自动配置

# 验证HA
curl http://10.0.0.100:5000

# 检查VIP
ip addr show | grep 10.0.0.100

5.2 配置监控


# 启用Prometheus和Grafana
# /etc/kolla/globals.yml
enable_prometheus: "yes"
enable_grafana: "yes"

# 访问Grafana
# http://10.0.0.100:3000
# 默认用户名: admin
# 密码: admin

5.3 配置日志


# 配置ELK Stack
# /etc/kolla/globals.yml
enable_elasticsearch: "yes"
enable_kibana: "yes"
enable_fluentd: "yes"

# 访问Kibana
# http://10.0.0.100:5601

5.4 配置告警


# 配置Aodh告警
# 创建CPU使用率告警
aodh alarm create \
  --name high-cpu \
  --type threshold \
  --meter-name cpu_util \
  --threshold 80 \
  --comparison-operator gt \
  --statistic avg \
  --period 300

六、上线检查清单

6.1 基础设施检查


# [ ] 所有节点可访问
ping -c 2 controller01
ping -c 2 compute01

# [ ] NTP时间同步
chronyc tracking

# [ ] 存储容量充足
df -h

# [ ] 网络连通性
iperf3 -c 10.0.0.21

6.2 服务检查


# [ ] API服务正常
curl -s http://10.0.0.100:5000 | head -20

# [ ] 数据库集群正常
mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

# [ ] 消息队列正常
rabbitmqctl cluster_status

# [ ] 存储服务正常
openstack volume service list

6.3 安全检查


# [ ] SSL证书配置
curl -I https://10.0.0.100:443

# [ ] 防火墙配置
ufw status numbered

# [ ] 审计日志启用
tail -f /var/log/keystone/audit.log

6.4 性能检查


# [ ] API响应时间
time openstack server list

# [ ] 虚拟机创建时间
time openstack server create --flavor m1.small --image cirros --network test test-perf

# [ ] 存储IOPS
fio --name=test --ioengine=libaio --direct=1 --rw=randread --bs=4k --size=1G --numjobs=4

七、运维准备

7.1 配置管理


# [ ] Ansible配置
# /home/deploy/multinode

# [ ] 备份配置
# 备份/etc/kolla/

# [ ] 文档准备
# 架构文档
# 运维手册
# 应急预案

7.2 监控配置


# [ ] 配置Grafana仪表板
# 导入OpenStack仪表板

# [ ] 配置告警规则
# CPU使用率
# 内存使用率
# 磁盘使用率
# 服务状态

7.3 备份策略


# [ ] 数据库备份
# mysqldump

# [ ] 配置文件备份
# /etc/kolla/

# [ ] 镜像备份
# 重要镜像导出

八、常见问题处理

8.1 部署失败


# 查看错误日志
tail -f /var/log/kolla/ansible.log

# 重新执行部署
kolla-ansible -i multinode deploy -e "reconfigure=true"

# 检查依赖
pip3 list | grep kolla
ansible --version

8.2 服务启动失败


# 检查容器状态
docker ps -a | grep kolla

# 查看容器日志
docker logs kolla_keystone_1

# 重启服务
kolla-ansible -i multinode service restart -e "kolla_action=restart"

8.3 网络问题


# 检查网络命名空间
ip netns list

# 检查OVS
ovs-vsctl show

# 检查neutron agents
openstack network agent list

九、总结

本文详细介绍了OpenStack生产环境的完整部署流程。

核心要点:

恭喜您完成了OpenStack生产环境部署!

建议持续关注官方文档更新,定期进行版本升级和安全加固。

发表回复

后才能评论