kubernetes环境手动部署 Prometheus 监控系统安装文档
mhr18 2025-05-08 19:54 10 浏览 0 评论
前言:文中“实操示例”配置内容,可按需要进行拆解安装配置
一、环境准备
- Kubernetes 集群
确保已部署 Kubernetes 集群(版本 ≥1.20),且 kubectl 工具已配置。 - 镜像仓库
确认镜像 harbor.fq.com/prometheus/node-exporter:v1.8.2 和 Prometheus 相关镜像在私有仓库中可用。 - 命名空间
默认使用 default 命名空间,可根据需求调整至 monitoring(需同步修改所有 YAML 文件中的 namespace 字段)。
二、创建 RBAC 权限
目标:为 Prometheus 分配访问 Kubernetes API 的权限。
1. 创建 ServiceAccount
# prometheus-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
secrets:
- name: prometheus-token
解释:
- ServiceAccount prometheus 用于 Prometheus 的身份认证。
- secrets 字段关联一个 Secret(prometheus-token),存储访问凭证。
2. 创建 ClusterRole
# prometheus-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
解释:
- 授予 Prometheus 访问节点、服务、Pod 等资源的权限。
- 允许读取 /metrics 端点(非资源 URL)。
3. 创建 ClusterRoleBinding
# prometheus-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: rbac.authorization.k8s.io
解释:
- 将 prometheus ClusterRole 绑定到 prometheus ServiceAccount,确保权限生效。
4. 生成 ServiceAccount Token
# prometheus-token.yaml
apiVersion: v1
kind: Secret
metadata:
name: prometheus-token
annotations:
kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token
应用 RBAC 配置:
kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrole.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-token.yaml
☆实操示例
cat prometheus-rabc0227.yaml
---
# 1. 创建 monitoring 命名空间
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
# 2. 创建 Prometheus 使用的 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
# 3. 创建 ClusterRole,定义 Prometheus 的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups: [""]
resources:
- nodes/proxy
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
# 4. 将 ClusterRole 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
---
三、部署 Node Exporter
目标:在每个节点上部署 Node Exporter,收集节点资源指标。
# node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostNetwork: true
containers:
- name: node-exporter
image: harbor.fq.com/prometheus/node-exporter:v1.8.2
args:
- --path.rootfs=/host
volumeMounts:
- name: rootfs
mountPath: /host
volumes:
- name: rootfs
hostPath:
path: /
解释:
- DaemonSet 确保每个节点运行一个 Node Exporter Pod。
- hostNetwork: true 使用节点网络,直接暴露节点指标。
- hostPath 挂载根文件系统,用于收集节点级数据。
部署命令:
kubectl apply -f node-exporter-daemonset.yml
☆实操示例
cat node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring # 使用 "monitoring" 命名空间
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
annotations:
prometheus.io/scrape: "true" # 允许 Prometheus 抓取数据
prometheus.io/port: "9100" # 指定 Node Exporter 端口
spec:
hostNetwork: true # 允许 Pod 使用主机网络
hostPID: true # 允许访问主机的 PID 进程
tolerations:
- effect: NoSchedule # 允许调度到 tainted 节点
operator: Exists
- effect: NoExecute
operator: Exists
securityContext:
runAsNonRoot: true # 避免使用 root 权限
runAsUser: 65534 # 运行时使用 nobody 用户
containers:
- name: node-exporter
image: harbor.fq.com/prometheus/node-exporter:v1.8.2 # 替换为可信赖的镜像地址
args:
- --path.rootfs=/host/root # 设定 rootfs 路径
- --path.procfs=/host/proc # 设定 procfs 路径
- --path.sysfs=/host/sys # 设定 sysfs 路径
- --no-collector.wifi # 禁用 WiFi 采集
- --no-collector.hwmon # 禁用硬件监控采集
ports:
- containerPort: 9100
protocol: TCP
resources: # 资源请求与限制
requests:
memory: "30Mi"
cpu: "100m"
limits:
memory: "50Mi"
cpu: "200m"
volumeMounts: # 挂载主机目录
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: rootfs
mountPath: /host/root
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
annotations:
prometheus.io/scrape: 'true' # 允许 Prometheus 采集
prometheus.io/port: '9100' # 采集端口
spec:
selector:
k8s-app: node-exporter
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
type: ClusterIP # 仅在集群内部可访问
四、部署 Prometheus
目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。
1. 创建持久化存储卷(PV/PVC)
根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。
示例(需根据实际环境调整):
# prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
2. 创建 Prometheus Deployment
# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.42.0
args:
- "--config.file=/etc/prometheus/prometheus.yml"
ports:
- containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- name: data-volume
mountPath: /prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: data-volume
persistentVolumeClaim:
claimName: prometheus-data
☆实操示例
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring # 指定命名空间
labels:
app: prometheus
spec:
replicas: 1 # 生产环境通常建议 1 个实例,使用远程存储提高可用性
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus # 关联 ServiceAccount,便于 RBAC 访问
containers:
- name: prometheus
image: harbor.fq.com/prometheus/prometheus:v3.1.0 # 使用私有仓库镜像
args:
- --config.file=/etc/prometheus/prometheus.yml # 指定 Prometheus 配置文件
- --storage.tsdb.path=/prometheus # 存储 TSDB 数据的位置
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
ports:
- containerPort: 9090 # Prometheus Web 界面端口
resources: # 限制 CPU 和内存,防止资源耗尽
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus # 配置文件挂载点
- name: prometheus-storage
mountPath: /prometheus # TSDB 数据存储路径
- name: file-sd
mountPath: /apps/prometheus/file-sd.yaml # 动态目标发现文件路径
subPath: file-sd.yaml # 仅挂载文件,而不是整个目录
volumes:
- name: prometheus-config
configMap:
name: prometheus-config # 从 ConfigMap 挂载 Prometheus 配置
- name: prometheus-storage
# persistentVolumeClaim: # 生产环境使用 PVC 持久化存储
# claimName: prometheus-pvc
emptyDir: {} # 测试环境可使用空目录
- name: file-sd
hostPath:
path: /root/file-sd.yaml # 使用主机上的动态发现文件
type: File
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
type: NodePort # 在生产环境中建议使用 LoadBalancer 或 Ingress
ports:
- port: 9090
targetPort: 9090
nodePort: 30090 # 通过 NodePort 访问 Web 界面
selector:
app: prometheus
3. 创建 Prometheus ConfigMap
# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'pushgateway'
static_configs:
- targets: ['pushgateway:9091']
- job_name: 'node-linux'
static_configs:
- targets: ['10.255.209.40:9100']
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [_meta_kubernetes_node_ip]
regex: '(.*):10250' # Kubernetes 节点的默认 kubelet 端口
replacement: '${1}:9100' # Node Exporter 的监听端口
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kube-system
- default
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:9090
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
应用配置:
kubectl apply -f prometheus-pvc.yaml
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
☆实操示例
cat prometheus-configmap0227.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s # 添加超时时间,避免抓取任务卡住
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
# 抓取 Prometheus 自身指标
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# 抓取 Node Exporter 指标
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# 抓取 cAdvisor 指标
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
# 抓取 Pushgateway 指标
- job_name: 'pushgateway'
static_configs:
- targets: ['pushgateway:9091']
# 抓取特定节点的 Node Exporter 指标
- job_name: 'node-linux'
static_configs:
- targets: ['10.255.209.40:9100']
# 抓取 Kubernetes API Server 指标
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# 抓取 Kubernetes 节点指标(通过 Node Exporter)
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100' # 将 kubelet 端口替换为 Node Exporter 端口
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# 抓取 Kubernetes Pods 指标
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 抓取 Kubernetes Service Endpoints 指标
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
#scheme: https
#tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
#bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
- job_name: 'kubernetes-nginx-endpoints' # 任务名称
kubernetes_sd_configs:
- role: endpoints # 自动发现 Kubernetes Endpoints
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换抓取协议(http 或 https)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
# 替换指标路径(默认为 /metrics)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# 替换抓取地址和端口
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
# 将 Kubernetes 标签映射到 Prometheus 标签
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
# 添加 Kubernetes Namespace 标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
# 添加 Kubernetes Service 名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 添加 Kubernetes Node 名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node_name
# 如果需要抓取 HTTPS 端点,取消注释以下配置
# scheme: https
# tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
# bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: 'kube-state-metrics'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
- monitoring
- default
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
action: keep
regex: kube-state-metrics
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: http-metrics
metrics_path: /metrics
scheme: http
- job_name: "file_sd"
file_sd_configs:
- files:
- /apps/prometheus/file-sd.yaml
refresh_interval: 1m
- job_name: 'redis'
kubernetes_sd_configs:
- role: endpoints # 从 Kubernetes Endpoints 发现服务
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换目标地址为服务的 IP 和指定端口(9121)
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
action: keep
regex: Pod;(.*redis.*) # 仅抓取名称包含 "redis" 的 Pod
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
replacement: $1:9121 # 指定 Redis Exporter 的端口为 9121
# 添加 Kubernetes 服务的 app 标签
- source_labels: [__meta_kubernetes_service_label_app]
action: replace
target_label: app
# 添加 Kubernetes 命名空间标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 添加 Kubernetes 服务名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 添加 Kubernetes 节点名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
# 添加实例标签(用于区分不同的 Redis 实例)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: instance
- job_name: 'mysql'
kubernetes_sd_configs:
- role: endpoints # 从 Kubernetes Endpoints 发现服务
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换目标地址为服务的 IP 和指定端口(9104)
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
action: keep
regex: Pod;(.*mysql-exporter.*) # 仅抓取名称包含 "mysql-exporter" 的 Pod
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
replacement: $1:9104 # 指定 MySQL Exporter 的端口为 9104
# 添加 Kubernetes 服务的 app 标签
- source_labels: [__meta_kubernetes_service_label_app]
action: replace
target_label: app
# 添加 Kubernetes 命名空间标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 添加 Kubernetes 服务名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 添加 Kubernetes 节点名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
# 添加实例标签(用于区分不同的 MySQL 实例)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: instance
4. 暴露 Prometheus 服务
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
selector:
app: prometheus
应用服务:
kubectl apply -f prometheus-service.yaml
五、验证部署
- 检查 Pod 状态:
- kubectl get pods -l app=prometheus -n default kubectl get pods -n kube-system -l app=node-exporter
- 预期输出:所有 Pod 状态为 Running。
- 访问 Prometheus UI:
通过浏览器访问 http://<NodeIP>:30090,进入 Prometheus 控制台。 - 在 Status > Targets 页面,确认 kubernetes-nodes 和 kubernetes-pods 任务状态为 UP。
- 查询 up{job="kubernetes-nodes"} 验证指标抓取是否正常。
六、常见问题排查
- 权限问题
- 错误示例:Failed to list *v1.Pod: forbidden
- 解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。
- Node Exporter 未启动
- 检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。
- Prometheus 无法抓取指标
- 检查 Prometheus 配置中的 scrape_configs 是否指向正确的端口(如 Node Exporter 默认端口为 9100)。
- 验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics。
七、后续优化
- 配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。
- 持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。
- 监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。
相关推荐
- 使用 Docker 部署 Java 项目(通俗易懂)
-
前言:搜索镜像的网站(推荐):DockerDocs1、下载与配置Docker1.1docker下载(这里使用的是Ubuntu,Centos命令可能有不同)以下命令,默认不是root用户操作,...
- Spring Boot 3.3.5 + CRaC:从冷启动到秒级响应的架构实践与踩坑实录
-
去年,我们团队负责的电商订单系统因扩容需求需在10分钟内启动200个Pod实例。当运维组按下扩容按钮时,传统SpringBoot应用的冷启动耗时(平均8.7秒)直接导致流量洪峰期出现30%的请求超时...
- 《github精选系列》——SpringBoot 全家桶
-
1简单总结1SpringBoot全家桶简介2项目简介3子项目列表4环境5运行6后续计划7问题反馈gitee地址:https://gitee.com/yidao620/springbo...
- Nacos简介—1.Nacos使用简介
-
大纲1.Nacos的在服务注册中心+配置中心中的应用2.Nacos2.x最新版本下载与目录结构3.Nacos2.x的数据库存储与日志存储4.Nacos2.x服务端的startup.sh启动脚...
- spring-ai ollama小试牛刀
-
序本文主要展示下spring-aiollama的使用示例pom.xml<dependency><groupId>org.springframework.ai<...
- SpringCloud系列——10Spring Cloud Gateway网关
-
学习目标Gateway是什么?它有什么作用?Gateway中的断言使用Gateway中的过滤器使用Gateway中的路由使用第1章网关1.1网关的概念简单来说,网关就是一个网络连接到另外一个网络的...
- Spring Boot 自动装配原理剖析
-
前言在这瞬息万变的技术领域,比了解技术的使用方法更重要的是了解其原理及应用背景。以往我们使用SpringMVC来构建一个项目需要很多基础操作:添加很多jar,配置web.xml,配置Spr...
- 疯了!Spring 再官宣惊天大漏洞
-
Spring官宣高危漏洞大家好,我是栈长。前几天爆出来的Spring漏洞,刚修复完又来?今天愚人节来了,这是和大家开玩笑吗?不是的,我也是猝不及防!这个玩笑也开的太大了!!你之前看到的这个漏洞已...
- 「架构师必备」基于SpringCloud的SaaS型微服务脚手架
-
简介基于SpringCloud(Hoxton.SR1)+SpringBoot(2.2.4.RELEASE)的SaaS型微服务脚手架,具备用户管理、资源权限管理、网关统一鉴权、Xss防跨站攻击、...
- SpringCloud分布式框架&分布式事务&分布式锁
-
总结本文承接上一篇SpringCloud分布式框架实践之后,进一步实践分布式事务与分布式锁,其中分布式事务主要是基于Seata的AT模式进行强一致性,基于RocketMQ事务消息进行最终一致性,分布式...
- SpringBoot全家桶:23篇博客加23个可运行项目让你对它了如指掌
-
SpringBoot现在已经成为Java开发领域的一颗璀璨明珠,它本身是包容万象的,可以跟各种技术集成。本项目对目前Web开发中常用的各个技术,通过和SpringBoot的集成,并且对各种技术通...
- 开发好物推荐12之分布式锁redisson-sb
-
前言springboot开发现在基本都是分布式环境,分布式环境下分布式锁的使用必不可少,主流分布式锁主要包括数据库锁,redis锁,还有zookepper实现的分布式锁,其中最实用的还是Redis分...
- 拥抱Kubernetes,再见了Spring Cloud
-
相信很多开发者在熟悉微服务工作后,才发现:以为用SpringCloud已经成功打造了微服务架构帝国,殊不知引入了k8s后,却和CloudNative的生态发展脱轨。从2013年的...
- Zabbix/J监控框架和Spring框架的整合方法
-
Zabbix/J是一个Java版本的系统监控框架,它可以完美地兼容于Zabbix监控系统,使得开发、运维等技术人员能够对整个业务系统的基础设施、应用软件/中间件和业务逻辑进行全方位的分层监控。Spri...
- SpringBoot+JWT+Shiro+Mybatis实现Restful快速开发后端脚手架
-
作者:lywJee来源:cnblogs.com/lywJ/p/11252064.html一、背景前后端分离已经成为互联网项目开发标准,它会为以后的大型分布式架构打下基础。SpringBoot使编码配置...
你 发表评论:
欢迎- 一周热门
- 最近发表
- 标签列表
-
- oracle位图索引 (63)
- oracle批量插入数据 (62)
- oracle事务隔离级别 (53)
- oracle 空为0 (50)
- oracle主从同步 (55)
- oracle 乐观锁 (51)
- redis 命令 (78)
- php redis (88)
- redis 存储 (66)
- redis 锁 (69)
- 启动 redis (66)
- redis 时间 (56)
- redis 删除 (67)
- redis内存 (57)
- redis并发 (52)
- redis 主从 (69)
- redis 订阅 (51)
- redis 登录 (54)
- redis 面试 (58)
- 阿里 redis (59)
- redis 搭建 (53)
- redis的缓存 (55)
- lua redis (58)
- redis 连接池 (61)
- redis 限流 (51)