百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 技术教程 > 正文

kubernetes环境手动部署 Prometheus 监控系统安装文档

mhr18 2025-05-08 19:54 10 浏览 0 评论


前言:文中“实操示例”配置内容,可按需要进行拆解安装配置

一、环境准备

  1. Kubernetes 集群
    确保已部署 Kubernetes 集群(版本 ≥1.20),且 kubectl 工具已配置。
  2. 镜像仓库
    确认镜像 harbor.fq.com/prometheus/node-exporter:v1.8.2 和 Prometheus 相关镜像在私有仓库中可用。
  3. 命名空间
    默认使用 default 命名空间,可根据需求调整至 monitoring(需同步修改所有 YAML 文件中的 namespace 字段)。

二、创建 RBAC 权限

目标:为 Prometheus 分配访问 Kubernetes API 的权限。

1. 创建 ServiceAccount

# prometheus-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
secrets:
- name: prometheus-token

解释

  • ServiceAccount prometheus 用于 Prometheus 的身份认证。
  • secrets 字段关联一个 Secret(prometheus-token),存储访问凭证。

2. 创建 ClusterRole

# prometheus-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

解释

  • 授予 Prometheus 访问节点、服务、Pod 等资源的权限。
  • 允许读取 /metrics 端点(非资源 URL)。

3. 创建 ClusterRoleBinding

# prometheus-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
roleRef:
  kind: ClusterRole
  name: prometheus
  apiGroup: rbac.authorization.k8s.io

解释

  • 将 prometheus ClusterRole 绑定到 prometheus ServiceAccount,确保权限生效。

4. 生成 ServiceAccount Token

# prometheus-token.yaml
apiVersion: v1
kind: Secret
metadata:
  name: prometheus-token
  annotations:
    kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token

应用 RBAC 配置

kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrole.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-token.yaml

☆实操示例

cat prometheus-rabc0227.yaml

---
# 1. 创建 monitoring 命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
# 2. 创建 Prometheus 使用的 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
# 3. 创建 ClusterRole,定义 Prometheus 的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups: [""]
  resources:
  - nodes/proxy
  verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
# 4. 将 ClusterRole 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
---

三、部署 Node Exporter

目标:在每个节点上部署 Node Exporter,收集节点资源指标。

# node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      containers:
        - name: node-exporter
          image: harbor.fq.com/prometheus/node-exporter:v1.8.2
          args:
            - --path.rootfs=/host
          volumeMounts:
            - name: rootfs
              mountPath: /host
      volumes:
        - name: rootfs
          hostPath:
            path: /

解释

  • DaemonSet 确保每个节点运行一个 Node Exporter Pod。
  • hostNetwork: true 使用节点网络,直接暴露节点指标。
  • hostPath 挂载根文件系统,用于收集节点级数据。

部署命令

kubectl apply -f node-exporter-daemonset.yml

☆实操示例

cat node-exporter-daemonset.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring  # 使用 "monitoring" 命名空间
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
      k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
      annotations:
        prometheus.io/scrape: "true"  # 允许 Prometheus 抓取数据
        prometheus.io/port: "9100"    # 指定 Node Exporter 端口
    spec:
      hostNetwork: true  # 允许 Pod 使用主机网络
      hostPID: true      # 允许访问主机的 PID 进程
      tolerations:
      - effect: NoSchedule  # 允许调度到 tainted 节点
        operator: Exists
      - effect: NoExecute
        operator: Exists
      securityContext:
        runAsNonRoot: true  # 避免使用 root 权限
        runAsUser: 65534     # 运行时使用 nobody 用户

      containers:
      - name: node-exporter
        image: harbor.fq.com/prometheus/node-exporter:v1.8.2  # 替换为可信赖的镜像地址
        args:
        - --path.rootfs=/host/root   # 设定 rootfs 路径
        - --path.procfs=/host/proc   # 设定 procfs 路径
        - --path.sysfs=/host/sys     # 设定 sysfs 路径
        - --no-collector.wifi        # 禁用 WiFi 采集
        - --no-collector.hwmon       # 禁用硬件监控采集
        ports:
        - containerPort: 9100
          protocol: TCP
        resources:  # 资源请求与限制
          requests:
            memory: "30Mi"
            cpu: "100m"
          limits:
            memory: "50Mi"
            cpu: "200m"
        volumeMounts:  # 挂载主机目录
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: rootfs
          mountPath: /host/root
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: rootfs
        hostPath:
          path: /
---
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    k8s-app: node-exporter
  annotations:
    prometheus.io/scrape: 'true'  # 允许 Prometheus 采集
    prometheus.io/port: '9100'    # 采集端口
spec:
  selector:
    k8s-app: node-exporter
  ports:
  - name: metrics
    port: 9100
    protocol: TCP
    targetPort: 9100
  type: ClusterIP  # 仅在集群内部可访问

四、部署 Prometheus

目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。

1. 创建持久化存储卷(PV/PVC)

根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。
示例(需根据实际环境调整):

# prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

2. 创建 Prometheus Deployment

# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
        - name: prometheus
          image: prom/prometheus:v2.42.0
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus
            - name: data-volume
              mountPath: /prometheus
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
        - name: data-volume
          persistentVolumeClaim:
            claimName: prometheus-data

☆实操示例

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring  # 指定命名空间
  labels:
    app: prometheus
spec:
  replicas: 1  # 生产环境通常建议 1 个实例,使用远程存储提高可用性
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus  # 关联 ServiceAccount,便于 RBAC 访问
      containers:
      - name: prometheus
        image: harbor.fq.com/prometheus/prometheus:v3.1.0  # 使用私有仓库镜像
        args:
        - --config.file=/etc/prometheus/prometheus.yml  # 指定 Prometheus 配置文件
        - --storage.tsdb.path=/prometheus  # 存储 TSDB 数据的位置
        - --web.console.templates=/etc/prometheus/consoles
        - --web.console.libraries=/etc/prometheus/console_libraries
        ports:
        - containerPort: 9090  # Prometheus Web 界面端口
        resources:  # 限制 CPU 和内存,防止资源耗尽
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/prometheus  # 配置文件挂载点
        - name: prometheus-storage
          mountPath: /prometheus  # TSDB 数据存储路径
        - name: file-sd
          mountPath: /apps/prometheus/file-sd.yaml  # 动态目标发现文件路径
          subPath: file-sd.yaml  # 仅挂载文件,而不是整个目录
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config  # 从 ConfigMap 挂载 Prometheus 配置
      - name: prometheus-storage
        # persistentVolumeClaim:  # 生产环境使用 PVC 持久化存储
         #  claimName: prometheus-pvc
         emptyDir: {}  # 测试环境可使用空目录
      - name: file-sd
        hostPath:
          path: /root/file-sd.yaml  # 使用主机上的动态发现文件
          type: File
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  type: NodePort  # 在生产环境中建议使用 LoadBalancer 或 Ingress
  ports:
  - port: 9090
    targetPort: 9090
    nodePort: 30090  # 通过 NodePort 访问 Web 界面
  selector:
    app: prometheus

3. 创建 Prometheus ConfigMap

# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['alertmanager:9093']
    rule_files:
      - '/etc/prometheus/alert_rules.yml'
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']

      - job_name: 'node-exporter'
        static_configs:
          - targets: ['node-exporter:9100']

      - job_name: 'cadvisor'
        static_configs:
          - targets: ['cadvisor:8080']

      - job_name: 'pushgateway'
        static_configs:
          - targets: ['pushgateway:9091']
      - job_name: 'node-linux'
        static_configs:
          - targets: ['10.255.209.40:9100']
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
          - role: endpoints
            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https

      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
          - role: node
            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [_meta_kubernetes_node_ip]
            regex: '(.*):10250'  # Kubernetes 节点的默认 kubelet 端口
            replacement: '${1}:9100'  # Node Exporter 的监听端口
            target_label: __address__
            action: replace
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - kube-system
                - default
        tls_config:
           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
           insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (.+)
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            regex: (.+)
            replacement: ${1}:9090
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
            kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_service_name

应用配置

kubectl apply -f prometheus-pvc.yaml
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml

☆实操示例

cat prometheus-configmap0227.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      scrape_timeout: 10s  # 添加超时时间,避免抓取任务卡住

    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['alertmanager:9093']

    rule_files:
      - '/etc/prometheus/alert_rules.yml'

    scrape_configs:
      # 抓取 Prometheus 自身指标
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']

      # 抓取 Node Exporter 指标
      - job_name: 'node-exporter'
        static_configs:
          - targets: ['node-exporter:9100']

      # 抓取 cAdvisor 指标
      - job_name: 'cadvisor'
        static_configs:
          - targets: ['cadvisor:8080']

      # 抓取 Pushgateway 指标
      - job_name: 'pushgateway'
        static_configs:
          - targets: ['pushgateway:9091']

      # 抓取特定节点的 Node Exporter 指标
      - job_name: 'node-linux'
        static_configs:
          - targets: ['10.255.209.40:9100']

      # 抓取 Kubernetes API Server 指标
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
          - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https

      # 抓取 Kubernetes 节点指标(通过 Node Exporter)
      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'  # 将 kubelet 端口替换为 Node Exporter 端口
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)

      # 抓取 Kubernetes Pods 指标
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

      # 抓取 Kubernetes Service Endpoints 指标
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        #scheme: https
        #tls_config:
        #  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        #  insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书
        #bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_service_name

      - job_name: 'kubernetes-nginx-endpoints'  # 任务名称
        kubernetes_sd_configs:
          - role: endpoints  # 自动发现 Kubernetes Endpoints
        relabel_configs:
          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

          # 替换抓取协议(http 或 https)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)

          # 替换指标路径(默认为 /metrics)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)

          # 替换抓取地址和端口
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2

          # 将 Kubernetes 标签映射到 Prometheus 标签
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)

          # 添加 Kubernetes Namespace 标签
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace

          # 添加 Kubernetes Service 名称标签
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_service_name

          # 添加 Kubernetes Pod 名称标签
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

          # 添加 Kubernetes Node 名称标签
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: kubernetes_node_name

        # 如果需要抓取 HTTPS 端点,取消注释以下配置
        # scheme: https
        # tls_config:
        #   ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        #   insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书
        # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      - job_name: 'kube-state-metrics'
        kubernetes_sd_configs:
          - role: endpoints
            namespaces:
              names:
                - kube-system
                - monitoring
                - default
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
            action: keep
            regex: kube-state-metrics
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            action: keep
            regex: http-metrics
        metrics_path: /metrics
        scheme: http
      - job_name: "file_sd"
        file_sd_configs:
        - files:
          - /apps/prometheus/file-sd.yaml
          refresh_interval: 1m
      - job_name: 'redis'
        kubernetes_sd_configs:
          - role: endpoints  # 从 Kubernetes Endpoints 发现服务
        relabel_configs:
          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

          # 替换目标地址为服务的 IP 和指定端口(9121)
          - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
            action: keep
            regex: Pod;(.*redis.*)  # 仅抓取名称包含 "redis" 的 Pod
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            replacement: $1:9121  # 指定 Redis Exporter 的端口为 9121

          # 添加 Kubernetes 服务的 app 标签
          - source_labels: [__meta_kubernetes_service_label_app]
            action: replace
            target_label: app

          # 添加 Kubernetes 命名空间标签
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace

          # 添加 Kubernetes 服务名称标签
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: service

          # 添加 Kubernetes Pod 名称标签
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod

          # 添加 Kubernetes 节点名称标签
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node

          # 添加实例标签(用于区分不同的 Redis 实例)
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: instance
      - job_name: 'mysql'
        kubernetes_sd_configs:
          - role: endpoints  # 从 Kubernetes Endpoints 发现服务
        relabel_configs:
          # 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

          # 替换目标地址为服务的 IP 和指定端口(9104)
          - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
            action: keep
            regex: Pod;(.*mysql-exporter.*)  # 仅抓取名称包含 "mysql-exporter" 的 Pod
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            replacement: $1:9104  # 指定 MySQL Exporter 的端口为 9104

          # 添加 Kubernetes 服务的 app 标签
          - source_labels: [__meta_kubernetes_service_label_app]
            action: replace
            target_label: app

          # 添加 Kubernetes 命名空间标签
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace

          # 添加 Kubernetes 服务名称标签
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: service

          # 添加 Kubernetes Pod 名称标签
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod

          # 添加 Kubernetes 节点名称标签
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node

          # 添加实例标签(用于区分不同的 MySQL 实例)
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: instance

4. 暴露 Prometheus 服务

# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30090
  selector:
    app: prometheus

应用服务

kubectl apply -f prometheus-service.yaml

五、验证部署

  1. 检查 Pod 状态
  2. kubectl get pods -l app=prometheus -n default kubectl get pods -n kube-system -l app=node-exporter
  3. 预期输出:所有 Pod 状态为 Running。
  4. 访问 Prometheus UI
    通过浏览器访问 http://<NodeIP>:30090,进入 Prometheus 控制台。
  5. Status > Targets 页面,确认 kubernetes-nodes 和 kubernetes-pods 任务状态为 UP。
  6. 查询 up{job="kubernetes-nodes"} 验证指标抓取是否正常。

六、常见问题排查

  1. 权限问题
  2. 错误示例:Failed to list *v1.Pod: forbidden
  3. 解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。
  4. Node Exporter 未启动
  5. 检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。
  6. Prometheus 无法抓取指标
  7. 检查 Prometheus 配置中的 scrape_configs 是否指向正确的端口(如 Node Exporter 默认端口为 9100)。
  8. 验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics。

七、后续优化

  1. 配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。
  2. 持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。
  3. 监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。

相关推荐

使用 Docker 部署 Java 项目(通俗易懂)

前言:搜索镜像的网站(推荐):DockerDocs1、下载与配置Docker1.1docker下载(这里使用的是Ubuntu,Centos命令可能有不同)以下命令,默认不是root用户操作,...

Spring Boot 3.3.5 + CRaC:从冷启动到秒级响应的架构实践与踩坑实录

去年,我们团队负责的电商订单系统因扩容需求需在10分钟内启动200个Pod实例。当运维组按下扩容按钮时,传统SpringBoot应用的冷启动耗时(平均8.7秒)直接导致流量洪峰期出现30%的请求超时...

《github精选系列》——SpringBoot 全家桶

1简单总结1SpringBoot全家桶简介2项目简介3子项目列表4环境5运行6后续计划7问题反馈gitee地址:https://gitee.com/yidao620/springbo...

Nacos简介—1.Nacos使用简介

大纲1.Nacos的在服务注册中心+配置中心中的应用2.Nacos2.x最新版本下载与目录结构3.Nacos2.x的数据库存储与日志存储4.Nacos2.x服务端的startup.sh启动脚...

spring-ai ollama小试牛刀

序本文主要展示下spring-aiollama的使用示例pom.xml<dependency><groupId>org.springframework.ai<...

SpringCloud系列——10Spring Cloud Gateway网关

学习目标Gateway是什么?它有什么作用?Gateway中的断言使用Gateway中的过滤器使用Gateway中的路由使用第1章网关1.1网关的概念简单来说,网关就是一个网络连接到另外一个网络的...

Spring Boot 自动装配原理剖析

前言在这瞬息万变的技术领域,比了解技术的使用方法更重要的是了解其原理及应用背景。以往我们使用SpringMVC来构建一个项目需要很多基础操作:添加很多jar,配置web.xml,配置Spr...

疯了!Spring 再官宣惊天大漏洞

Spring官宣高危漏洞大家好,我是栈长。前几天爆出来的Spring漏洞,刚修复完又来?今天愚人节来了,这是和大家开玩笑吗?不是的,我也是猝不及防!这个玩笑也开的太大了!!你之前看到的这个漏洞已...

「架构师必备」基于SpringCloud的SaaS型微服务脚手架

简介基于SpringCloud(Hoxton.SR1)+SpringBoot(2.2.4.RELEASE)的SaaS型微服务脚手架,具备用户管理、资源权限管理、网关统一鉴权、Xss防跨站攻击、...

SpringCloud分布式框架&amp;分布式事务&amp;分布式锁

总结本文承接上一篇SpringCloud分布式框架实践之后,进一步实践分布式事务与分布式锁,其中分布式事务主要是基于Seata的AT模式进行强一致性,基于RocketMQ事务消息进行最终一致性,分布式...

SpringBoot全家桶:23篇博客加23个可运行项目让你对它了如指掌

SpringBoot现在已经成为Java开发领域的一颗璀璨明珠,它本身是包容万象的,可以跟各种技术集成。本项目对目前Web开发中常用的各个技术,通过和SpringBoot的集成,并且对各种技术通...

开发好物推荐12之分布式锁redisson-sb

前言springboot开发现在基本都是分布式环境,分布式环境下分布式锁的使用必不可少,主流分布式锁主要包括数据库锁,redis锁,还有zookepper实现的分布式锁,其中最实用的还是Redis分...

拥抱Kubernetes,再见了Spring Cloud

相信很多开发者在熟悉微服务工作后,才发现:以为用SpringCloud已经成功打造了微服务架构帝国,殊不知引入了k8s后,却和CloudNative的生态发展脱轨。从2013年的...

Zabbix/J监控框架和Spring框架的整合方法

Zabbix/J是一个Java版本的系统监控框架,它可以完美地兼容于Zabbix监控系统,使得开发、运维等技术人员能够对整个业务系统的基础设施、应用软件/中间件和业务逻辑进行全方位的分层监控。Spri...

SpringBoot+JWT+Shiro+Mybatis实现Restful快速开发后端脚手架

作者:lywJee来源:cnblogs.com/lywJ/p/11252064.html一、背景前后端分离已经成为互联网项目开发标准,它会为以后的大型分布式架构打下基础。SpringBoot使编码配置...

取消回复欢迎 发表评论: