百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 技术教程 > 正文

kubernetes环境手动部署 Prometheus 监控系统安装文档

mhr18 2025-05-08 19:54 21 浏览 0 评论


前言:文中“实操示例”配置内容,可按需要进行拆解安装配置

一、环境准备

  1. Kubernetes 集群
    确保已部署 Kubernetes 集群(版本 ≥1.20),且 kubectl 工具已配置。
  2. 镜像仓库
    确认镜像 harbor.fq.com/prometheus/node-exporter:v1.8.2 和 Prometheus 相关镜像在私有仓库中可用。
  3. 命名空间
    默认使用 default 命名空间,可根据需求调整至 monitoring(需同步修改所有 YAML 文件中的 namespace 字段)。

二、创建 RBAC 权限

目标:为 Prometheus 分配访问 Kubernetes API 的权限。

1. 创建 ServiceAccount

# prometheus-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
secrets:
- name: prometheus-token

解释

  • ServiceAccount prometheus 用于 Prometheus 的身份认证。
  • secrets 字段关联一个 Secret(prometheus-token),存储访问凭证。

2. 创建 ClusterRole

# prometheus-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

解释

  • 授予 Prometheus 访问节点、服务、Pod 等资源的权限。
  • 允许读取 /metrics 端点(非资源 URL)。

3. 创建 ClusterRoleBinding

# prometheus-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
roleRef:
  kind: ClusterRole
  name: prometheus
  apiGroup: rbac.authorization.k8s.io

解释

  • 将 prometheus ClusterRole 绑定到 prometheus ServiceAccount,确保权限生效。

4. 生成 ServiceAccount Token

# prometheus-token.yaml
apiVersion: v1
kind: Secret
metadata:
  name: prometheus-token
  annotations:
    kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token

应用 RBAC 配置

kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrole.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-token.yaml

☆实操示例

cat prometheus-rabc0227.yaml

---
# 1. 创建 monitoring 命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
# 2. 创建 Prometheus 使用的 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
# 3. 创建 ClusterRole,定义 Prometheus 的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups: [""]
  resources:
  - nodes/proxy
  verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
# 4. 将 ClusterRole 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
---

三、部署 Node Exporter

目标:在每个节点上部署 Node Exporter,收集节点资源指标。

# node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      containers:
        - name: node-exporter
          image: harbor.fq.com/prometheus/node-exporter:v1.8.2
          args:
            - --path.rootfs=/host
          volumeMounts:
            - name: rootfs
              mountPath: /host
      volumes:
        - name: rootfs
          hostPath:
            path: /

解释

  • DaemonSet 确保每个节点运行一个 Node Exporter Pod。
  • hostNetwork: true 使用节点网络,直接暴露节点指标。
  • hostPath 挂载根文件系统,用于收集节点级数据。

部署命令

kubectl apply -f node-exporter-daemonset.yml

☆实操示例

cat node-exporter-daemonset.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring  # 使用 "monitoring" 命名空间
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
      k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
      annotations:
        prometheus.io/scrape: "true"  # 允许 Prometheus 抓取数据
        prometheus.io/port: "9100"    # 指定 Node Exporter 端口
    spec:
      hostNetwork: true  # 允许 Pod 使用主机网络
      hostPID: true      # 允许访问主机的 PID 进程
      tolerations:
      - effect: NoSchedule  # 允许调度到 tainted 节点
        operator: Exists
      - effect: NoExecute
        operator: Exists
      securityContext:
        runAsNonRoot: true  # 避免使用 root 权限
        runAsUser: 65534     # 运行时使用 nobody 用户

      containers:
      - name: node-exporter
        image: harbor.fq.com/prometheus/node-exporter:v1.8.2  # 替换为可信赖的镜像地址
        args:
        - --path.rootfs=/host/root   # 设定 rootfs 路径
        - --path.procfs=/host/proc   # 设定 procfs 路径
        - --path.sysfs=/host/sys     # 设定 sysfs 路径
        - --no-collector.wifi        # 禁用 WiFi 采集
        - --no-collector.hwmon       # 禁用硬件监控采集
        ports:
        - containerPort: 9100
          protocol: TCP
        resources:  # 资源请求与限制
          requests:
            memory: "30Mi"
            cpu: "100m"
          limits:
            memory: "50Mi"
            cpu: "200m"
        volumeMounts:  # 挂载主机目录
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: rootfs
          mountPath: /host/root
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: rootfs
        hostPath:
          path: /
---
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    k8s-app: node-exporter
  annotations:
    prometheus.io/scrape: 'true'  # 允许 Prometheus 采集
    prometheus.io/port: '9100'    # 采集端口
spec:
  selector:
    k8s-app: node-exporter
  ports:
  - name: metrics
    port: 9100
    protocol: TCP
    targetPort: 9100
  type: ClusterIP  # 仅在集群内部可访问

四、部署 Prometheus

目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。

1. 创建持久化存储卷(PV/PVC)

根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。
示例(需根据实际环境调整):

# prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

2. 创建 Prometheus Deployment

# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
        - name: prometheus
          image: prom/prometheus:v2.42.0
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus
            - name: data-volume
              mountPath: /prometheus
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
        - name: data-volume
          persistentVolumeClaim:
            claimName: prometheus-data

☆实操示例

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring  # 指定命名空间
  labels:
    app: prometheus
spec:
  replicas: 1  # 生产环境通常建议 1 个实例,使用远程存储提高可用性
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus  # 关联 ServiceAccount,便于 RBAC 访问
      containers:
      - name: prometheus
        image: harbor.fq.com/prometheus/prometheus:v3.1.0  # 使用私有仓库镜像
        args:
        - --config.file=/etc/prometheus/prometheus.yml  # 指定 Prometheus 配置文件
        - --storage.tsdb.path=/prometheus  # 存储 TSDB 数据的位置
        - --web.console.templates=/etc/prometheus/consoles
        - --web.console.libraries=/etc/prometheus/console_libraries
        ports:
        - containerPort: 9090  # Prometheus Web 界面端口
        resources:  # 限制 CPU 和内存,防止资源耗尽
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/prometheus  # 配置文件挂载点
        - name: prometheus-storage
          mountPath: /prometheus  # TSDB 数据存储路径
        - name: file-sd
          mountPath: /apps/prometheus/file-sd.yaml  # 动态目标发现文件路径
          subPath: file-sd.yaml  # 仅挂载文件,而不是整个目录
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config  # 从 ConfigMap 挂载 Prometheus 配置
      - name: prometheus-storage
        # persistentVolumeClaim:  # 生产环境使用 PVC 持久化存储
         #  claimName: prometheus-pvc
         emptyDir: {}  # 测试环境可使用空目录
      - name: file-sd
        hostPath:
          path: /root/file-sd.yaml  # 使用主机上的动态发现文件
          type: File
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  type: NodePort  # 在生产环境中建议使用 LoadBalancer 或 Ingress
  ports:
  - port: 9090
    targetPort: 9090
    nodePort: 30090  # 通过 NodePort 访问 Web 界面
  selector:
    app: prometheus

3. 创建 Prometheus ConfigMap

# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['alertmanager:9093']
    rule_files:
      - '/etc/prometheus/alert_rules.yml'
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']

      - job_name: 'node-exporter'
        static_configs:
          - targets: ['node-exporter:9100']

      - job_name: 'cadvisor'
        static_configs:
          - targets: ['cadvisor:8080']

      - job_name: 'pushgateway'
        static_configs:
          - targets: ['pushgateway:9091']
      - job_name: 'node-linux'
        static_configs:
          - targets: ['10.255.209.40:9100']
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
          - role: endpoints
            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https

      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
          - role: node
            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [_meta_kubernetes_node_ip]
            regex: '(.*):10250'  # Kubernetes 节点的默认 kubelet 端口
            replacement: '${1}:9100'  # Node Exporter 的监听端口
            target_label: __address__
            action: replace
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - kube-system
                - default
        tls_config:
           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
           insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (.+)
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            regex: (.+)
            replacement: ${1}:9090
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_service_name

应用配置

kubectl apply -f prometheus-pvc.yaml
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml

☆实操示例

cat prometheus-configmap0227.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      scrape_timeout: 10s  # 添加超时时间,避免抓取任务卡住

    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['alertmanager:9093']

    rule_files:
      - '/etc/prometheus/alert_rules.yml'

    scrape_configs:
      # 抓取 Prometheus 自身指标
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']

      # 抓取 Node Exporter 指标
      - job_name: 'node-exporter'
        static_configs:
          - targets: ['node-exporter:9100']

      # 抓取 cAdvisor 指标
      - job_name: 'cadvisor'
        static_configs:
          - targets: ['cadvisor:8080']

      # 抓取 Pushgateway 指标
      - job_name: 'pushgateway'
        static_configs:
          - targets: ['pushgateway:9091']

      # 抓取特定节点的 Node Exporter 指标
      - job_name: 'node-linux'
        static_configs:
          - targets: ['10.255.209.40:9100']

      # 抓取 Kubernetes API Server 指标
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
          - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https

      # 抓取 Kubernetes 节点指标(通过 Node Exporter)
      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'  # 将 kubelet 端口替换为 Node Exporter 端口
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)

      # 抓取 Kubernetes Pods 指标
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

      # 抓取 Kubernetes Service Endpoints 指标
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        #scheme: https
        #tls_config:
        #  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        #  insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书
        #bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_service_name

      - job_name: 'kubernetes-nginx-endpoints'  # 任务名称
        kubernetes_sd_configs:
          - role: endpoints  # 自动发现 Kubernetes Endpoints
        relabel_configs:
          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

          # 替换抓取协议(http 或 https)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)

          # 替换指标路径(默认为 /metrics)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)

          # 替换抓取地址和端口
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2

          # 将 Kubernetes 标签映射到 Prometheus 标签
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)

          # 添加 Kubernetes Namespace 标签
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace

          # 添加 Kubernetes Service 名称标签
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_service_name

          # 添加 Kubernetes Pod 名称标签
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

          # 添加 Kubernetes Node 名称标签
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: kubernetes_node_name

        # 如果需要抓取 HTTPS 端点,取消注释以下配置
        # scheme: https
        # tls_config:
        #   ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        #   insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书
        # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      - job_name: 'kube-state-metrics'
        kubernetes_sd_configs:
          - role: endpoints
            namespaces:
              names:
                - kube-system
                - monitoring
                - default
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
            action: keep
            regex: kube-state-metrics
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            action: keep
            regex: http-metrics
        metrics_path: /metrics
        scheme: http
      - job_name: "file_sd"
        file_sd_configs:
        - files:
          - /apps/prometheus/file-sd.yaml
          refresh_interval: 1m
      - job_name: 'redis'
        kubernetes_sd_configs:
          - role: endpoints  # 从 Kubernetes Endpoints 发现服务
        relabel_configs:
          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

          # 替换目标地址为服务的 IP 和指定端口(9121)
          - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
            action: keep
            regex: Pod;(.*redis.*)  # 仅抓取名称包含 "redis" 的 Pod
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            replacement: $1:9121  # 指定 Redis Exporter 的端口为 9121

          # 添加 Kubernetes 服务的 app 标签
          - source_labels: [__meta_kubernetes_service_label_app]
            action: replace
            target_label: app

          # 添加 Kubernetes 命名空间标签
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace

          # 添加 Kubernetes 服务名称标签
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: service

          # 添加 Kubernetes Pod 名称标签
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod

          # 添加 Kubernetes 节点名称标签
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node

          # 添加实例标签(用于区分不同的 Redis 实例)
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: instance
      - job_name: 'mysql'
        kubernetes_sd_configs:
          - role: endpoints  # 从 Kubernetes Endpoints 发现服务
        relabel_configs:
          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

          # 替换目标地址为服务的 IP 和指定端口(9104)
          - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
            action: keep
            regex: Pod;(.*mysql-exporter.*)  # 仅抓取名称包含 "mysql-exporter" 的 Pod
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            replacement: $1:9104  # 指定 MySQL Exporter 的端口为 9104

          # 添加 Kubernetes 服务的 app 标签
          - source_labels: [__meta_kubernetes_service_label_app]
            action: replace
            target_label: app

          # 添加 Kubernetes 命名空间标签
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace

          # 添加 Kubernetes 服务名称标签
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: service

          # 添加 Kubernetes Pod 名称标签
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod

          # 添加 Kubernetes 节点名称标签
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node

          # 添加实例标签(用于区分不同的 MySQL 实例)
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: instance

4. 暴露 Prometheus 服务

# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30090
  selector:
    app: prometheus

应用服务

kubectl apply -f prometheus-service.yaml

五、验证部署

  1. 检查 Pod 状态
  2. kubectl get pods -l app=prometheus -n default kubectl get pods -n kube-system -l app=node-exporter
  3. 预期输出:所有 Pod 状态为 Running。
  4. 访问 Prometheus UI
    通过浏览器访问 http://<NodeIP>:30090,进入 Prometheus 控制台。
  5. Status > Targets 页面,确认 kubernetes-nodes 和 kubernetes-pods 任务状态为 UP。
  6. 查询 up{job="kubernetes-nodes"} 验证指标抓取是否正常。

六、常见问题排查

  1. 权限问题
  2. 错误示例:Failed to list *v1.Pod: forbidden
  3. 解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。
  4. Node Exporter 未启动
  5. 检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。
  6. Prometheus 无法抓取指标
  7. 检查 Prometheus 配置中的 scrape_configs 是否指向正确的端口(如 Node Exporter 默认端口为 9100)。
  8. 验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics。

七、后续优化

  1. 配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。
  2. 持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。
  3. 监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。

相关推荐

【推荐】一个开源免费、AI 驱动的智能数据管理系统,支持多数据库

如果您对源码&技术感兴趣,请点赞+收藏+转发+关注,大家的支持是我分享最大的动力!!!.前言在当今数据驱动的时代,高效、智能地管理数据已成为企业和个人不可或缺的能力。为了满足这一需求,我们推出了这款开...

Pure Storage推出统一数据管理云平台及新闪存阵列

PureStorage公司今日推出企业数据云(EnterpriseDataCloud),称其为组织在混合环境中存储、管理和使用数据方式的全面架构升级。该公司表示,EDC使组织能够在本地、云端和混...

对Java学习的10条建议(对java课程的建议)

不少Java的初学者一开始都是信心满满准备迎接挑战,但是经过一段时间的学习之后,多少都会碰到各种挫败,以下北风网就总结一些对于初学者非常有用的建议,希望能够给他们解决现实中的问题。Java编程的准备:...

SQLShift 重大更新:Oracle→PostgreSQL 存储过程转换功能上线!

官网:https://sqlshift.cn/6月,SQLShift迎来重大版本更新!作为国内首个支持Oracle->OceanBase存储过程智能转换的工具,SQLShift在过去一...

JDK21有没有什么稳定、简单又强势的特性?

佳未阿里云开发者2025年03月05日08:30浙江阿里妹导读这篇文章主要介绍了Java虚拟线程的发展及其在AJDK中的实现和优化。阅前声明:本文介绍的内容基于AJDK21.0.5[1]以及以上...

「松勤软件测试」网站总出现404 bug?总结8个原因,不信解决不了

在进行网站测试的时候,有没有碰到过网站崩溃,打不开,出现404错误等各种现象,如果你碰到了,那么恭喜你,你的网站出问题了,是什么原因导致网站出问题呢,根据松勤软件测试的总结如下:01数据库中的表空间不...

Java面试题及答案最全总结(2025版)

大家好,我是Java面试陪考员最近很多小伙伴在忙着找工作,给大家整理了一份非常全面的Java面试题及答案。涉及的内容非常全面,包含:Spring、MySQL、JVM、Redis、Linux、Sprin...

数据库日常运维工作内容(数据库日常运维 工作内容)

#数据库日常运维工作包括哪些内容?#数据库日常运维工作是一个涵盖多个层面的综合性任务,以下是详细的分类和内容说明:一、数据库运维核心工作监控与告警性能监控:实时监控CPU、内存、I/O、连接数、锁等待...

分布式之系统底层原理(上)(底层分布式技术)

作者:allanpan,腾讯IEG高级后台工程师导言分布式事务是分布式系统必不可少的组成部分,基本上只要实现一个分布式系统就逃不开对分布式事务的支持。本文从分布式事务这个概念切入,尝试对分布式事务...

oracle 死锁了怎么办?kill 进程 直接上干货

1、查看死锁是否存在selectusername,lockwait,status,machine,programfromv$sessionwheresidin(selectsession...

SpringBoot 各种分页查询方式详解(全网最全)

一、分页查询基础概念与原理1.1什么是分页查询分页查询是指将大量数据分割成多个小块(页)进行展示的技术,它是现代Web应用中必不可少的功能。想象一下你去图书馆找书,如果所有书都堆在一张桌子上,你很难...

《战场兄弟》全事件攻略 一般事件合同事件红装及隐藏职业攻略

《战场兄弟》全事件攻略,一般事件合同事件红装及隐藏职业攻略。《战场兄弟》事件奖励,事件条件。《战场兄弟》是OverhypeStudios制作发行的一款由xcom和桌游为灵感来源,以中世纪、低魔奇幻为...

LoadRunner(loadrunner录制不到脚本)

一、核心组件与工作流程LoadRunner性能测试工具-并发测试-正版软件下载-使用教程-价格-官方代理商的架构围绕三大核心组件构建,形成完整测试闭环:VirtualUserGenerator(...

Redis数据类型介绍(redis 数据类型)

介绍Redis支持五种数据类型:String(字符串),Hash(哈希),List(列表),Set(集合)及Zset(sortedset:有序集合)。1、字符串类型概述1.1、数据类型Redis支持...

RMAN备份监控及优化总结(rman备份原理)

今天主要介绍一下如何对RMAN备份监控及优化,这里就不讲rman备份的一些原理了,仅供参考。一、监控RMAN备份1、确定备份源与备份设备的最大速度从磁盘读的速度和磁带写的带度、备份的速度不可能超出这两...

取消回复欢迎 发表评论: