Kubernetes 进阶之日志与监控篇
1、日志
在Kubernetes或者说Docker中都可以查看容器的日志,但如果直接通过命令行的方式去看的话会很麻烦,要不断的去敲命令,去找容器名称,很不方便操作!
在没有使用容器的项目中,我们也知道有ELK这一套组合拳是专门用来操作日志的,那K8S肯定也是可以用它们来进行日志收集、查看的。
1.1 日志查看
1.1.1 命令方式
1.1.1.1 docker
docker ps 查看 container id
image.png
docker logs container_id 查看日志
image.png
1.1.1.2 Kubernetes命令
kubectl 命令查看
image.png
kubectl logs -f <pod-name> [-c <container-name>]
Pod日志
image.png
kubectl describe pod <pod-name>
kubectl describe除了能够查看pod的日志信息,还能查看比如Node、RC、Service、Namespace等信息。
注意 :要是想查看指定命名空间之下的,可以-n=namespace
组件服务级别
比如kube-apiserver、kube-schedule、kubelet、kube-proxy、kube-controller-manager等都可以使用journalctl进行查看
如果K8S中出现问题了,一般就可以通过这些命令来查看是不是组件出异常了
image.png
1.1.2 ELK
Elasticsearch搜索引擎与名为Logstash的数据收集和日志解析引擎以及名为Kibana的分析和可视化平台一起开发。这三个产品被设计成一个集成解决方案,称为 Elastic Stack(ELK Stack)
。
日志采集可以有多种方式选择,ELK中的Logstash可以由其他具有相同功能的多个组件替换,如Logpilot、fluentd等,本章节我们采用 Logpilot 进行演示。
1.1.2.1 结构图
Pod中的日志通过mount挂载到宿主机中的某个目录
Logpilot采集这个目录下日志之后交给Elasticsearch去做搜索引擎的规则
最后通过Kibana做可视化的展示
image.png
1.1.2.2 部署ELK
1.1.2.2.1 部署LogPilot
准备YAML文件
由于环境被我弄坏了,我这边重新安装了K8S集群,并且采用了最新版的1.18.2
下面的yaml文件可能会不兼容旧版本,如果不兼容,大家网上搜一份修改下
apiVersion: apps/v1kind: DaemonSet # 日志采集需要在所有节点都运行,所以采用DaemonSet资源metadata: name: log-pilot namespace: kube-system labels: k8s-app: log-pilot kubernetes.io/cluster-service: "true"spec: selector: matchLabels: k8s-app: log-es kubernetes.io/cluster-service: "true" version: v1.22 template: metadata: labels: k8s-app: log-es kubernetes.io/cluster-service: "true" version: v1.22 spec: tolerations: # 这里的配置表示允许将资源部署到Master节点中 - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: log-pilot image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.5-filebeat resources: # 资源限制 limits: memory: 200Mi requests: cpu: 100m memory: 200Mi env: # 定义与Elasticsearch通信的一些环境变量 - name: "FILEBEAT_OUTPUT" value: "elasticsearch" - name: "ELASTICSEARCH_HOST" value: "elasticsearch-api" - name: "ELASTICSEARCH_PORT" value: "9200" - name: "ELASTICSEARCH_USER" value: "elastic" - name: "ELASTICSEARCH_PASSWORD" value: "elastic" volumeMounts: # 挂载日志目录 - name: sock mountPath: /var/run/docker.sock - name: root mountPath: /host readOnly: true - name: varlib mountPath: /var/lib/filebeat - name: varlog mountPath: /var/log/filebeat securityContext: capabilities: add: - SYS_ADMIN terminationGracePeriodSeconds: 30 volumes: - name: sock hostPath: path: /var/run/docker.sock - name: root hostPath: path: / - name: varlib hostPath: path: /var/lib/filebeat type: DirectoryOrCreate - name: varlog hostPath: path: /var/log/filebeat type: DirectoryOrCreate
log-pilot.yaml
创建资源
[root@master-kubeadm-k8s log]# kubectl apply -f log-pilot.yamldaemonset.extensions/log-pilot created
查看资源
[root@master-kubeadm-k8s log]# kubectl get pods -n kube-system -o wide | grep loglog-pilot-8f4nv 1/1 Running 0 2m4s 192.168.221.88 worker02-kubeadm-k8s <none> <none>log-pilot-h25fc 1/1 Running 0 2m4s 192.168.16.250 master-kubeadm-k8s <none> <none> [root@master-kubeadm-k8s log]# kubectl get daemonset -n kube-systemNAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEcalico-node 3 3 3 3 3 beta.kubernetes.io/os=linux 41d kube-proxy 3 3 3 3 3 <none> 41d log-pilot 2 2 2 2 2 <none> 26s
1.1.2.2.2 部署Elasticsearch
准备YAML文件
注意这里的资源要求还是比较高的,自己注意下系统资源是否够用
apiVersion: v1kind: Service # 这里的service是为外部访问ElasticSearch提供metadata: name: elasticsearch-api # 这里的名称要与logPilot中的ElasticHost环境变量一致 namespace: kube-system labels: name: elasticsearchspec: selector: app: es ports: - name: transport port: 9200 protocol: TCP---apiVersion: v1kind: Service # # 这里的service是为ElasticSearch集群之间通信提供的metadata: name: elasticsearch-discovery namespace: kube-system labels: name: elasticsearchspec: selector: app: es ports: - name: transport port: 9300 protocol: TCP---apiVersion: apps/v1kind: StatefulSet # 希望Elasticsearch是有序启动的,所以使用StatefulSetmetadata: name: elasticsearch namespace: kube-system labels: kubernetes.io/cluster-service: "true"spec: replicas: 3 serviceName: "elasticsearch-service" selector: matchLabels: app: es template: metadata: labels: app: es spec: tolerations: # 同样elasticsearch也部署到Master节点中 - effect: NoSchedule key: node-role.kubernetes.io/master initContainers: # 这里是指在创建容器之前要先进行初始化操作 - name: init-sysctl image: busybox:1.27 command: - sysctl - -w - vm.max_map_count=262144 securityContext: privileged: true containers: - name: elasticsearch image: registry.cn-hangzhou.aliyuncs.com/log-monitor/elasticsearch:v5.5.1 ports: - containerPort: 9200 protocol: TCP - containerPort: 9300 protocol: TCP securityContext: capabilities: add: - IPC_LOCK - SYS_RESOURCE resources: limits: memory: 4000Mi requests: cpu: 100m memory: 2000Mi env: - name: "http.host" value: "0.0.0.0" - name: "network.host" value: "_eth0_" - name: "cluster.name" value: "docker-cluster" - name: "bootstrap.memory_lock" value: "false" - name: "discovery.zen.ping.unicast.hosts" value: "elasticsearch-discovery" - name: "discovery.zen.ping.unicast.hosts.resolve_timeout" value: "10s" - name: "discovery.zen.ping_timeout" value: "6s" - name: "discovery.zen.minimum_master_nodes" value: "2" - name: "discovery.zen.fd.ping_interval" value: "2s" - name: "discovery.zen.no_master_block" value: "write" - name: "gateway.expected_nodes" value: "2" - name: "gateway.expected_master_nodes" value: "1" - name: "transport.tcp.connect_timeout" value: "60s" - name: "ES_JAVA_OPTS" value: "-Xms2g -Xmx2g" livenessProbe: # 健康检查 tcpSocket: port: transport initialDelaySeconds: 20 periodSeconds: 10 volumeMounts: - name: es-data mountPath: /data terminationGracePeriodSeconds: 30 volumes: - name: es-data hostPath: path: /es-data
elasticsearch.yaml
创建资源
[root@master-kubeadm-k8s log]# kubectl apply -f elasticsearch.yamlservice/elasticsearch-api created service/elasticsearch-discovery created statefulset.apps/elasticsearch created
查看资源
# Pod会进行有序的创建[root@master-kubeadm-k8s log]# kubectl get pods -n kube-system -o wide | grep elasticelasticsearch-0 1/1 Running 0 4m36s 10.244.221.69 worker02- kubeadm-k8s <none> <none>elasticsearch-1 1/1 Running 0 4m33s 10.244.14.4 worker01-kubeadm-k8s <none> <none>elasticsearch-2 0/1 PodInitializing 0 101s 10.244.16.194 master-kubeadm-k8s <none> <none>[root@master-kubeadm-k8s log]# kubectl get svc -n kube-system -o wide | grep elasticelasticsearch-api ClusterIP 10.104.144.183 <none> 9200/TCP 5m2s app=es elasticsearch-discovery ClusterIP 10.109.137.36 <none> 9300/TCP 5m2s app=es[root@master-kubeadm-k8s log]# kubectl get statefulset -n kube-systemNAME READY AGEelasticsearch 3/3 6m14s
1.1.2.2.3 部署Kibana
kibana主要是对外提供访问的,所以这边需要配置Service和Ingress
前提:要有Ingress Controller的支持,比如Nginx Controller
准备YAML文件
kibana.yaml
# DeploymentapiVersion: apps/v1kind: Deploymentmetadata: name: kibana namespace: kube-system labels: component: kibanaspec: replicas: 1 selector: matchLabels: component: kibana template: metadata: labels: component: kibana spec: containers: - name: kibana image: registry.cn-hangzhou.aliyuncs.com/log-monitor/kibana:v5.5.1 env: - name: CLUSTER_NAME value: docker-cluster - name: ELASTICSEARCH_URL # elasticsearch 地址 value: http://elasticsearch-api:9200/ resources: limits: cpu: 1000m requests: cpu: 100m ports: - containerPort: 5601 name: http---# ServiceapiVersion: v1kind: Servicemetadata: name: kibana namespace: kube-system labels: component: kibanaspec: selector: component: kibana ports: - name: http port: 80 targetPort: http---# IngressapiVersion: extensions/v1beta1kind: Ingressmetadata: name: kibana namespace: kube-systemspec: rules: - host: log.k8s.sunny.com # 本地hosts配置域名 http: paths: - path: / backend: serviceName: kibana servicePort: 80
创建资源
[root@master-kubeadm-k8s log]# kubectl apply -f kibana.yamldeployment.apps/kibana created service/kibana created ingress.extensions/kibana created
查看资源
[root@master-kubeadm-k8s log]# kubectl get pods -n kube-system | grep kibanakibana-8747dff7d-l627g 1/1 Running 0 2m2s[root@master-kubeadm-k8s log]# kubectl get svc -n kube-system | grep kibanakibana ClusterIP 10.109.177.214 <none> 80/TCP 2m40s[root@master-kubeadm-k8s log]# kubectl get ingress -n kube-system | grep kibanakibana <none> log.k8s.sunny.com 80 2m43s
测试
image.png
2、监控
2.1 Prometheus简介
这里的监控是指监控K8S集群的健康状态,包括节点、K8S组件以及Pod都会监控。
Prometheus 是一个开源的监控和警报系统,它直接从目标主机上运行的代理程序中抓取指标,并将收集的样本集中存储在其服务器上。
2016 年 Prometheus 成为继 Kubernetes 之后,成为 CNCF (Cloud Native Computing Foundation)中的第二个项目成员。
2.1.2 主要功能
多维 数据模型(时序由 metric 名字和 k/v 的 labels 构成)。
灵活的查询语句(PromQL)。
无依赖存储,支持 local 和 remote 不同模型。
采用 http 协议,使用 pull 模式,拉取数据,简单易懂。
监控目标,可以采用服务发现或静态配置的方式。
支持多种统计数据模型,图形化友好。
2.1.3 Prometheus架构
image.png
从这个架构图,也可以看出 Prometheus 的主要模块包含:Server、Exporters、Pushgateway、PromQL、Alertmanager、WebUI 等。
它大致使用逻辑是这样:
Prometheus server 定期从静态配置的 target 或者服务发现的 target 拉取数据。
当新拉取的数据大于配置内存缓存区的时候,Prometheus 会将数据持久化到磁盘(如果使用 remote storage 将持久化到云端)。
Prometheus 可以配置 rule,然后定时查询数据,当条件触发的时候,会将 alert 推送到配置的 Alertmanager。
Alertmanager 收到警告的时候,可以根据配置,聚合、去重、降噪,最后发送警告。
可以使用 API、Prometheus Console 或者 Grafana 查询和聚合数据。
2.1.4 Prometheus知识普及
支持pull、push数据添加方式
支持k8s服务发现
提供查询语言PromQL
时序(time series)是由名字(Metric)以及一组key/value标签定义的数据类型
2.2 数据采集
服务器数据
在每一个节点中部署工具:Node-Exporter
K8S组件数据
IP:2379/metrics 拿到ETCD数据
IP:6443/metrics 拿到apiServer数据
IP:10252/metrics 拿到controller manager数据
IP:10251/metrics 拿到scheduler数据
访问集群中不同端口下的metrics接口即可拿到
比如
容器数据
在kubelet中有个cAdvisor组件,默认就可以拿到容器的数据
2.2.1 服务器数据采集
Prometheus可以从Kubernetes集群的各个组件中采集数据,比如kubelet中自带的cAdvisor,api-server等,而node-export就是其中一种来源。
Exporter是Prometheus的一类数据采集组件的总称。它负责从目标处搜集数据,并将其转化为Prometheus支持的格式。
服务器的指标采集是通过 Node-Exporter 进行采集,比如服务器CPU、内存、磁盘、I/O等信息
image.png
2.2.1.1 部署NodeExporter
准备YAML文件
namespace.yaml
我们将监控的资源都放到这个命名空间下,方便管理
apiVersion: v1kind: Namespacemetadata: name: ns-monitor labels: name: ns-monitor
node-exporter.yaml
kind: DaemonSetapiVersion: apps/v1metadata: labels: app: node-exporter name: node-exporter namespace: ns-monitorspec: revisionHistoryLimit: 10 selector: matchLabels: app: node-exporter template: metadata: labels: app: node-exporter spec: containers: - name: node-exporter image: prom/node-exporter:v0.16.0 ports: - containerPort: 9100 protocol: TCP name: http hostNetwork: true hostPID: true tolerations: - effect: NoSchedule operator: Exists---# 本来NodeExporter是不需要对外提供访问的,但我们这里一步一步来,先保证每一步都正确再往后进行kind: ServiceapiVersion: v1metadata: labels: app: node-exporter name: node-exporter-service namespace: ns-monitorspec: ports: - name: http port: 9100 nodePort: 31672 protocol: TCP type: NodePort selector: app: node-exporter
创建资源
[root@master-kubeadm-k8s prometheus]# kubectl apply -f node-exporter.yamldaemonset.apps/node-exporter created service/node-exporter-service created
查看资源
[root@master-kubeadm-k8s prometheus]# kubectl get pods -n ns-monitorNAME READY STATUS RESTARTS AGEnode-exporter-dsjbq 1/1 Running 0 2m32s node-exporter-mdnrj 1/1 Running 0 2m32s node-exporter-sxwxx 1/1 Running 0 2m32s[root@master-kubeadm-k8s prometheus]# kubectl get svc -n ns-monitorNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEnode-exporter-service NodePort 10.109.226.6 <none> 9100:31672/TCP 2m46s
测试
image.png
2.2.1 部署Prometheus
准备YAML文件
prometheus.yaml
prometheus的yaml文件很值得学习,很多常用的资源类型这里都用到了
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: prometheusrules: - apiGroups: [""] # "" indicates the core API group resources: - nodes - nodes/proxy - services - endpoints - pods verbs: - get - watch - list - apiGroups: - extensions resources: - ingresses verbs: - get - watch - list - nonResourceURLs: ["/metrics"] verbs: - get---apiVersion: v1kind: ServiceAccountmetadata: name: prometheus namespace: ns-monitor labels: app: prometheus---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: prometheussubjects: - kind: ServiceAccount name: prometheus namespace: ns-monitorroleRef: kind: ClusterRole name: prometheus apiGroup: rbac.authorization.k8s.io---apiVersion: v1kind: ConfigMapmetadata: name: prometheus-conf namespace: ns-monitor labels: app: prometheusdata: prometheus.yml: |- # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] - job_name: 'grafana' static_configs: - targets: - 'grafana-service.ns-monitor:3000' - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # If your node certificates are self-signed or use a different CA to the # master CA, then disable certificate verification below. Note that # certificate verification is an integral part of a secure infrastructure # so this should only be disabled in a controlled environment. You can # disable certificate verification by uncommenting the line below. # # insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # Keep only the default/kubernetes service endpoints for the https port. This # will add targets for each API server which Kubernetes adds an endpoint to # the default/kubernetes service. relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https # Scrape config for nodes (kubelet). # # Rather than connecting directly to the node, the scrape is proxied though the # Kubernetes apiserver. This means it will work if Prometheus is running out of # cluster, or can't connect to nodes for some other reason (e.g. because of # firewalling). - job_name: 'kubernetes-nodes' # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics # Scrape config for Kubelet cAdvisor. # # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics # (those whose names begin with 'container_') have been removed from the # Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to # retrieve those metrics. # # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics" # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with # the --cadvisor-port=0 Kubelet flag). # # This job is not necessary and should be removed in Kubernetes 1.6 and # earlier versions, or it will cause the metrics to be scraped twice. - job_name: 'kubernetes-cadvisor' # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor # Scrape config for service endpoints. # # The relabeling allows the actual service scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/scrape`: Only scrape services that have a value of `true` # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` & most likely set the `tls_config` of the scrape config. # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: If the metrics are exposed on a different port to the # service then set this appropriately. - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name # Example scrape config for probing services via the Blackbox Exporter. # # The relabeling allows the actual service scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/probe`: Only probe services that have a value of `true` - job_name: 'kubernetes-services' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: service relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name # Example scrape config for probing ingresses via the Blackbox Exporter. # # The relabeling allows the actual ingress scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/probe`: Only probe services that have a value of `true` - job_name: 'kubernetes-ingresses' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: ingress relabel_configs: - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path] regex: (.+);(.+);(.+) replacement: ${1}://${2}${3} target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_ingress_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_ingress_name] target_label: kubernetes_name # Example scrape config for pods # # The relabeling allows the actual pod scrape endpoint to be configured via the # following annotations: # # * `prometheus.io/scrape`: Only scrape pods that have a value of `true` # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the # pod's declared ports (default is a port-free target if none are declared). - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name---apiVersion: v1kind: ConfigMapmetadata: name: prometheus-rules namespace: ns-monitor labels: app: prometheusdata: cpu-usage.rule: | groups: - name: NodeCPUUsage rules: - alert: NodeCPUUsage expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75 for: 2m labels: severity: "page" annotations: summary: "{{$labels.instance}}: High CPU usage detected" description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"---apiVersion: v1kind: PersistentVolumemetadata: name: "prometheus-data-pv" labels: name: prometheus-data-pv release: stablespec: capacity: storage: 5Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle nfs: path: /nfs/data/prometheus # 定义持久化存储目录 server: 192.168.50.111 # NFS服务器---apiVersion: v1kind: PersistentVolumeClaimmetadata: name: prometheus-data-pvc namespace: ns-monitorspec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi selector: matchLabels: name: prometheus-data-pv release: stable---kind: DeploymentapiVersion: apps/v1metadata: labels: app: prometheus name: prometheus namespace: ns-monitorspec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus securityContext: runAsUser: 0 containers: - name: prometheus image: prom/prometheus:latest imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /prometheus name: prometheus-data-volume - mountPath: /etc/prometheus/prometheus.yml name: prometheus-conf-volume subPath: prometheus.yml - mountPath: /etc/prometheus/rules name: prometheus-rules-volume ports: - containerPort: 9090 protocol: TCP volumes: - name: prometheus-data-volume persistentVolumeClaim: claimName: prometheus-data-pvc - name: prometheus-conf-volume configMap: name: prometheus-conf - name: prometheus-rules-volume configMap: name: prometheus-rules tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule---kind: ServiceapiVersion: v1metadata: annotations: prometheus.io/scrape: 'true' labels: app: prometheus name: prometheus-service namespace: ns-monitorspec: ports: - port: 9090 targetPort: 9090 selector: app: prometheus type: NodePort
创建资源
[root@master-kubeadm-k8s prometheus]# kubectl apply -f prometheus.yamlclusterrole.rbac.authorization.k8s.io/prometheus created serviceaccount/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created configmap/prometheus-conf created configmap/prometheus-rules created persistentvolume/prometheus-data-pv created persistentvolumeclaim/prometheus-data-pvc created deployment.apps/prometheus created service/prometheus-service created
查看资源
[root@master-kubeadm-k8s prometheus]# kubectl get pods -n ns-monitorNAME READY STATUS RESTARTS AGEnode-exporter-dsjbq 1/1 Running 1 26m node-exporter-mdnrj 1/1 Running 1 26m node-exporter-sxwxx 1/1 Running 2 26m prometheus-5f7cb6d955-mm8d2 1/1 Running 1 28s[root@master-kubeadm-k8s prometheus]# kubectl get pv -n ns-monitorNAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEprometheus-data-pv 5Gi RWO Recycle Bound ns-monitor/prometheus-data-pvc 53s[root@master-kubeadm-k8s prometheus]# kubectl get pvc -n ns-monitorNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEprometheus-data-pvc Bound prometheus-data-pv 5Gi RWO 60s[root@master-kubeadm-k8s prometheus]# kubectl get svc -n ns-monitorNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEnode-exporter-service NodePort 10.109.226.6 <none> 9100:31672/TCP 27m prometheus-service NodePort 10.101.128.125 <none> 9090:31615/TCP 64s
测试
image.png
2.2.4 部署Grafana监控UI
准备YAML文件
grafana.yaml
apiVersion: v1kind: PersistentVolumemetadata: name: "grafana-data-pv" labels: name: grafana-data-pv release: stablespec: capacity: storage: 5Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle nfs: path: /nfs/data/grafana server: 192.168.50.111---apiVersion: v1kind: PersistentVolumeClaimmetadata: name: grafana-data-pvc namespace: ns-monitorspec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi selector: matchLabels: name: grafana-data-pv release: stable---kind: DeploymentapiVersion: apps/v1metadata: labels: app: grafana name: grafana namespace: ns-monitorspec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: securityContext: runAsUser: 0 containers: - name: grafana image: grafana/grafana:latest imagePullPolicy: IfNotPresent env: - name: GF_AUTH_BASIC_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ENABLED value: "false" readinessProbe: httpGet: path: /login port: 3000 volumeMounts: - mountPath: /var/lib/grafana name: grafana-data-volume ports: - containerPort: 3000 protocol: TCP volumes: - name: grafana-data-volume persistentVolumeClaim: claimName: grafana-data-pvc---kind: ServiceapiVersion: v1metadata: labels: app: grafana name: grafana-service namespace: ns-monitorspec: ports: - port: 3000 targetPort: 3000 selector: app: grafana type: NodePort
grafana-ingress.yaml
#ingressapiVersion: extensions/v1beta1kind: Ingressmetadata: name: grafana-ingress namespace: ns-monitorspec: rules: - host: monitor.k8s.sunny.com http: paths: - path: / backend: serviceName: grafana-service servicePort: 3000
创建资源
[root@master-kubeadm-k8s prometheus]# kubectl apply -f grafana.yamlpersistentvolume/grafana-data-pv created persistentvolumeclaim/grafana-data-pvc created deployment.apps/grafana created service/grafana-service created[root@master-kubeadm-k8s prometheus]# kubectl apply -f grafana-ingress.yamlingress.extensions/grafana-ingress created
查看资源
[root@master-kubeadm-k8s prometheus]# kubectl get deploy -n ns-monitorNAME READY UP-TO-DATE AVAILABLE AGEgrafana 1/1 1 1 2m52s prometheus 1/1 1 1 6m41s[root@master-kubeadm-k8s prometheus]# kubectl get pv -n ns-monitorNAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEgrafana-data-pv 5Gi RWO Recycle Bound ns-monitor/grafana-data-pvc 3m10s prometheus-data-pv 5Gi RWO Recycle Bound ns-monitor/prometheus-data-pvc 7m[root@master-kubeadm-k8s prometheus]# kubectl get pvc -n ns-monitorNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEgrafana-data-pvc Bound grafana-data-pv 5Gi RWO 3m14s prometheus-data-pvc Bound prometheus-data-pv 5Gi RWO 7m4s[root@master-kubeadm-k8s prometheus]# kubectl get svc -n ns-monitorNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEgrafana-service NodePort 10.111.192.206 <none> 3000:31828/TCP 3m5s node-exporter-service NodePort 10.109.226.6 <none> 9100:31672/TCP 33m prometheus-service NodePort 10.101.128.125 <none> 9090:31615/TCP 7m4s[root@master-kubeadm-k8s prometheus]# kubectl get ingress -n ns-monitorNAME CLASS HOSTS ADDRESS PORTS AGEgrafana-ingress <none> monitor.k8s.sunny.com 80 2m15s
测试
账号密码都是 admin
image.png
作者:Suny____
链接:https://www.jianshu.com/p/6d3c29f87bcc
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。