Prometheus 是一套开源的系统监控报警框架,非常适合大规模集群的监控。它也是第二个加入CNCF的项目,受欢迎度仅次于 Kubernetes 的项目。本文讲解完整prometheus 监控和告警服务的搭建。
prometheus 监控是当下主流监控系统,它是多个服务组合使用的体系。整体架构预览如下:

本篇教程监控系统搭建,包括的服务有:

  1. prometheus
    监控的主体,负责数据汇总,保存,监控数据,产生告警信息
  2. exporter
    监控的采集端,负责数据采集
  3. grafana
    数据可视化,负责以丰富的页面展示采集到的数据
  4. alertmanager
    告警管理,负责告警信息处理,包括告警周期,消息优先级等
  5. prometheusAlert
    告警的具体发送端,负责配置告警模板,发出告警信息

除了监控采集节点,其他服务均通过docker-compose部署。部署系统信息:

  1. 系统
    :ubuntu20.04
  2. 服务器IP
    :172.16.9.124
  3. docker版本
    :20.10.21
  4. docker-compose版本
    :1.29.2
  5. 配置文件路径
    :/root/prometheus

部署prometheus

prometheus主要负责数据采集和存储,提供PromQL查询语言的支持。部署prometheus分为两个步骤:

  1. 准备配置文件
  2. 启动prometheus

准备配置文件

整个体系的配置文件在
/root/prometheus
,首先新建prometheus服务的配置文件路径
/root/prometheus/prometheus
,并在这个目录下新建:

  • config 用于放置服务主要配置文件 prometheus.yml
  • data 用于放置服务的数据库文件
root@ubuntu-System-Product-Name:~/prometheus# tree . -L 3
.
├── docker-compose.yaml
└── prometheus
    ├── config
    │   └── prometheus.yml
    └── data

新建prometheus.yml,prometheus服务的主配置文件

global:
  scrape_interval:     30s # 每30s采集一次数据
  evaluation_interval: 30s # 每30s做一次告警检测


scrape_configs:
  # 配置prometheus服务本身
  - job_name: prometheus
    static_configs:
      - targets: ['172.16.9.124:9090']
        labels:
          instance: prometheus

修改 data 目录的文件权限,让容器有权限在data目录里生成数据相关数据

chmod 777 data

创建 docker-compse.yml

version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    restart: always
    ports:
      - "9090:9090"
    volumes:
      - /root/prometheus/prometheus/config:/etc/prometheus
      - /root/prometheus/prometheus/data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.enable-lifecycle'

参数说明:
command:

  • --config.file=/etc/prometheus/prometheus.yml 指定使用的配置文件
  • --storage.tsdb.path=/prometheus 指定时序数据库的路径
  • --web.enable-lifecycle 支持配置热加载

volumes:

  • /root/prometheus/prometheus/config:/etc/prometheus 映射配置文件所在目录
  • /root/prometheus/prometheus/data:/prometheus 映射数据库路径参数

启动prometheus

启动 docker-compse
docker-compose up -d
查看日志:

root@ubuntu-System-Product-Name:~/prometheus# docker ps
CONTAINER ID   IMAGE                                           COMMAND                  CREATED         STATUS                 PORTS                                                        NAMES
776772d69b20   prom/prometheus                                 "/bin/prometheus --c…"   5 minutes ago   Up 5 minutes           0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                    prometheus

查看容器的日志:
docker logs -f 776

ts=2023-12-25T10:21:17.560Z caller=main.go:478 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2023-12-25T10:21:17.560Z caller=main.go:515 level=info msg="Starting prometheus" version="(version=2.32.1, branch=HEAD, revision=41f1a8125e664985dd30674e5bdf6b683eff5d32)"
ts=2023-12-25T10:21:17.561Z caller=main.go:520 level=info build_context="(go=go1.17.5, user=root@54b6dbd48b97, date=20211217-22:08:06)"
ts=2023-12-25T10:21:17.561Z caller=main.go:521 level=info host_details="(Linux 5.15.0-56-generic #62~20.04.1-Ubuntu SMP Tue Nov 22 21:24:20 UTC 2022 x86_64 776772d69b20 (none))"
ts=2023-12-25T10:21:17.561Z caller=main.go:522 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2023-12-25T10:21:17.561Z caller=main.go:523 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-12-25T10:21:17.562Z caller=web.go:570 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2023-12-25T10:21:17.562Z caller=main.go:924 level=info msg="Starting TSDB ..."
ts=2023-12-25T10:21:17.562Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2023-12-25T10:21:17.564Z caller=head.go:488 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-12-25T10:21:17.564Z caller=head.go:522 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=1.305µs
ts=2023-12-25T10:21:17.564Z caller=head.go:528 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2023-12-25T10:21:17.564Z caller=head.go:599 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=1
ts=2023-12-25T10:21:17.564Z caller=head.go:599 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=1
ts=2023-12-25T10:21:17.564Z caller=head.go:605 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=14.305µs wal_replay_duration=301.534µs total_replay_duration=327.342µs
ts=2023-12-25T10:21:17.565Z caller=main.go:945 level=info fs_type=EXT4_SUPER_MAGIC
ts=2023-12-25T10:21:17.565Z caller=main.go:948 level=info msg="TSDB started"
ts=2023-12-25T10:21:17.565Z caller=main.go:1129 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2023-12-25T10:21:17.565Z caller=main.go:1166 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=217.62µs db_storage=666666ns remote_storage=860ns web_handler=182ns query_engine=371ns scrape=90.382µs scrape_sd=10.238µs notify=450ns notify_sd=788ns rules=737ns
ts=2023-12-25T10:21:17.565Z caller=main.go:897 level=info msg="Server is ready to receive web requests."


日志很重要!日志很重要!日志很重要!

标签: none

添加新评论