OpenStack监控与日志管理:从配置到最佳实践


categories: - OpenStack运维 tags: - OpenStack - 监控 - 日志 - Prometheus - Grafana - ELK


OpenStack监控与日志管理:从配置到最佳实践

一、监控架构概述

1.1 OpenStack监控组件

组件 功能 监控指标
Ceilometer 指标采集 CPU、内存、网络、存储
Gnocchi 时序存储 指标存储和查询
Aodh 告警服务 阈值告警
Panko 事件存储 事件存储

二、Ceilometer监控配置

2.1 Ceilometer安装


# 安装Ceilometer
sudo apt-get install -y python3-ceilometer
sudo apt-get install -y ceilometer-agent-compute
sudo apt-get install -y ceilometer-agent-central

# 启动服务
sudo systemctl enable openstack-ceilometer-api
sudo systemctl start openstack-ceilometer-api

2.2 关键指标采集


# 查看可用指标
ceilometer meter-list

# 查看CPU使用率
ceilometer meter-list | grep cpu
ceilometer sample-list -m cpu_util

# 查看内存使用
ceilometer sample-list -m memory.usage

# 查看网络流量
ceilometer sample-list -m network.incoming.bytes

# 查看磁盘IO
ceilometer sample-list -m disk.read.bytes

2.3 Aodh告警配置


# 创建CPU告警
aodh alarm create \
  --name high-cpu \
  --type threshold \
  --meter-name cpu_util \
  --threshold 80.0 \
  --comparison-operator gt \
  --statistic avg \
  --period 300 \
  --evaluation-periods 2

# 查看告警列表
aodh alarm list

# 查看告警详情
aodh alarm show high-cpu

三、日志管理

3.1 日志位置

服务 日志路径
Keystone /var/log/keystone/
Nova /var/log/nova/
Neutron /var/log/neutron/
Cinder /var/log/cinder/
Glance /var/log/glance/

3.2 日志配置


# /etc/nova/nova.conf
[DEFAULT]
log_dir = /var/log/nova
logging_context_format_string = %(asctime)s.%(msecs)03d %(levelname)s %(name)s [%(request_id)s] %(instance)s%(message)s

log_file = nova.log
use_syslog = False
log_config_append = /etc/nova/logging.conf

3.3 日志轮转


# 配置logrotate
# /etc/logrotate.d/openstack

/var/log/nova/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 0640 nova nova
    postrotate
        systemctl reload nova > /dev/null 2>/dev/null || true
    endscript
}

四、集中式日志(ELK Stack)

4.1 Filebeat配置


# /etc/filebeat/filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nova/*.log
    - /var/log/neutron/*.log
    - /var/log/cinder/*.log
    - /var/log/keystone/*.log
  multiline:
    pattern: '^\d{4}-\d{2}-\d{2}'
    negate: true
    match: after

output.logstash:
  hosts: ["logstash:5044"]

4.2 Elasticsearch配置


# /etc/elasticsearch/elasticsearch.yml

cluster.name: openstack-logging
node.name: es-node-1
network.host: 0.0.0.0
http.port: 9200

discovery.type: single-node

# 内存设置
-Xms2g
-Xmx2g

4.3 Kibana配置


# 创建索引模式
# 访问Kibana
# Management > Index Patterns
# 创建 pattern: logstash-*

五、Grafana监控面板

5.1 数据源配置


# 安装Grafana
sudo apt-get install -y grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

# 添加数据源
#访问Grafana
# Configuration > Data Sources
# Add data source > Prometheus
# URL: http://prometheus:9090

5.2 监控面板示例


{
  "dashboard": {
    "title": "OpenStack Overview",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "avg(cpu_util) by (hostname)",
            "legendFormat": "{{hostname}}"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "avg(memory.usage) by (hostname)",
            "legendFormat": "{{hostname}}"
          }
        ]
      }
    ]
  }
}

六、最佳实践

6.1 监控策略

1. 分层监控

2. 告警分级

6.2 日志策略

1. 日志级别

2. 日志保留

6.3 性能优化


# 优化日志采集
# 1. 使用异步采集
# 2. 配置日志缓冲
# 3. 压缩传输

# 优化存储
# 1. 使用时序数据库
# 2. 配置数据压缩
# 3. 设置数据保留策略

七、故障排查

7.1 常见问题


# 1. 服务状态检查
systemctl status openstack-nova-api
systemctl status neutron-server

# 2. 日志查看
tail -f /var/log/nova/nova-api.log
tail -f /var/log/neutron/neutron-server.log

# 3. 服务注册检查
openstack endpoint list

7.2 性能问题排查


# 1. API响应时间
time openstack server list

# 2. 数据库查询慢
mysql -e "SHOW PROCESSLIST"

# 3. 消息队列积压
rabbitmqctl list_queues name messages

八、总结

本文介绍了OpenStack监控与日志管理的完整方案。

核心要点:

下篇预告: 《OpenStack高可用架构详解》

发表回复

后才能评论