kubernetes环境手动部署 Prometheus 监控系统安装文档
mhr18 2025-05-08 19:54 21 浏览 0 评论
前言:文中“实操示例”配置内容,可按需要进行拆解安装配置
一、环境准备
- Kubernetes 集群
确保已部署 Kubernetes 集群(版本 ≥1.20),且 kubectl 工具已配置。 - 镜像仓库
确认镜像 harbor.fq.com/prometheus/node-exporter:v1.8.2 和 Prometheus 相关镜像在私有仓库中可用。 - 命名空间
默认使用 default 命名空间,可根据需求调整至 monitoring(需同步修改所有 YAML 文件中的 namespace 字段)。
二、创建 RBAC 权限
目标:为 Prometheus 分配访问 Kubernetes API 的权限。
1. 创建 ServiceAccount
# prometheus-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
secrets:
- name: prometheus-token
解释:
- ServiceAccount prometheus 用于 Prometheus 的身份认证。
- secrets 字段关联一个 Secret(prometheus-token),存储访问凭证。
2. 创建 ClusterRole
# prometheus-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
解释:
- 授予 Prometheus 访问节点、服务、Pod 等资源的权限。
- 允许读取 /metrics 端点(非资源 URL)。
3. 创建 ClusterRoleBinding
# prometheus-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: rbac.authorization.k8s.io
解释:
- 将 prometheus ClusterRole 绑定到 prometheus ServiceAccount,确保权限生效。
4. 生成 ServiceAccount Token
# prometheus-token.yaml
apiVersion: v1
kind: Secret
metadata:
name: prometheus-token
annotations:
kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token
应用 RBAC 配置:
kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrole.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-token.yaml
☆实操示例
cat prometheus-rabc0227.yaml
---
# 1. 创建 monitoring 命名空间
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
# 2. 创建 Prometheus 使用的 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
# 3. 创建 ClusterRole,定义 Prometheus 的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups: [""]
resources:
- nodes/proxy
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
# 4. 将 ClusterRole 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
---
三、部署 Node Exporter
目标:在每个节点上部署 Node Exporter,收集节点资源指标。
# node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostNetwork: true
containers:
- name: node-exporter
image: harbor.fq.com/prometheus/node-exporter:v1.8.2
args:
- --path.rootfs=/host
volumeMounts:
- name: rootfs
mountPath: /host
volumes:
- name: rootfs
hostPath:
path: /
解释:
- DaemonSet 确保每个节点运行一个 Node Exporter Pod。
- hostNetwork: true 使用节点网络,直接暴露节点指标。
- hostPath 挂载根文件系统,用于收集节点级数据。
部署命令:
kubectl apply -f node-exporter-daemonset.yml
☆实操示例
cat node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring # 使用 "monitoring" 命名空间
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
annotations:
prometheus.io/scrape: "true" # 允许 Prometheus 抓取数据
prometheus.io/port: "9100" # 指定 Node Exporter 端口
spec:
hostNetwork: true # 允许 Pod 使用主机网络
hostPID: true # 允许访问主机的 PID 进程
tolerations:
- effect: NoSchedule # 允许调度到 tainted 节点
operator: Exists
- effect: NoExecute
operator: Exists
securityContext:
runAsNonRoot: true # 避免使用 root 权限
runAsUser: 65534 # 运行时使用 nobody 用户
containers:
- name: node-exporter
image: harbor.fq.com/prometheus/node-exporter:v1.8.2 # 替换为可信赖的镜像地址
args:
- --path.rootfs=/host/root # 设定 rootfs 路径
- --path.procfs=/host/proc # 设定 procfs 路径
- --path.sysfs=/host/sys # 设定 sysfs 路径
- --no-collector.wifi # 禁用 WiFi 采集
- --no-collector.hwmon # 禁用硬件监控采集
ports:
- containerPort: 9100
protocol: TCP
resources: # 资源请求与限制
requests:
memory: "30Mi"
cpu: "100m"
limits:
memory: "50Mi"
cpu: "200m"
volumeMounts: # 挂载主机目录
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: rootfs
mountPath: /host/root
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
annotations:
prometheus.io/scrape: 'true' # 允许 Prometheus 采集
prometheus.io/port: '9100' # 采集端口
spec:
selector:
k8s-app: node-exporter
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
type: ClusterIP # 仅在集群内部可访问
四、部署 Prometheus
目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。
1. 创建持久化存储卷(PV/PVC)
根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。
示例(需根据实际环境调整):
# prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
2. 创建 Prometheus Deployment
# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.42.0
args:
- "--config.file=/etc/prometheus/prometheus.yml"
ports:
- containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- name: data-volume
mountPath: /prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: data-volume
persistentVolumeClaim:
claimName: prometheus-data
☆实操示例
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring # 指定命名空间
labels:
app: prometheus
spec:
replicas: 1 # 生产环境通常建议 1 个实例,使用远程存储提高可用性
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus # 关联 ServiceAccount,便于 RBAC 访问
containers:
- name: prometheus
image: harbor.fq.com/prometheus/prometheus:v3.1.0 # 使用私有仓库镜像
args:
- --config.file=/etc/prometheus/prometheus.yml # 指定 Prometheus 配置文件
- --storage.tsdb.path=/prometheus # 存储 TSDB 数据的位置
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
ports:
- containerPort: 9090 # Prometheus Web 界面端口
resources: # 限制 CPU 和内存,防止资源耗尽
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus # 配置文件挂载点
- name: prometheus-storage
mountPath: /prometheus # TSDB 数据存储路径
- name: file-sd
mountPath: /apps/prometheus/file-sd.yaml # 动态目标发现文件路径
subPath: file-sd.yaml # 仅挂载文件,而不是整个目录
volumes:
- name: prometheus-config
configMap:
name: prometheus-config # 从 ConfigMap 挂载 Prometheus 配置
- name: prometheus-storage
# persistentVolumeClaim: # 生产环境使用 PVC 持久化存储
# claimName: prometheus-pvc
emptyDir: {} # 测试环境可使用空目录
- name: file-sd
hostPath:
path: /root/file-sd.yaml # 使用主机上的动态发现文件
type: File
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
type: NodePort # 在生产环境中建议使用 LoadBalancer 或 Ingress
ports:
- port: 9090
targetPort: 9090
nodePort: 30090 # 通过 NodePort 访问 Web 界面
selector:
app: prometheus
3. 创建 Prometheus ConfigMap
# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'pushgateway'
static_configs:
- targets: ['pushgateway:9091']
- job_name: 'node-linux'
static_configs:
- targets: ['10.255.209.40:9100']
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [_meta_kubernetes_node_ip]
regex: '(.*):10250' # Kubernetes 节点的默认 kubelet 端口
replacement: '${1}:9100' # Node Exporter 的监听端口
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kube-system
- default
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:9090
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
应用配置:
kubectl apply -f prometheus-pvc.yaml
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
☆实操示例
cat prometheus-configmap0227.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s # 添加超时时间,避免抓取任务卡住
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
# 抓取 Prometheus 自身指标
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# 抓取 Node Exporter 指标
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# 抓取 cAdvisor 指标
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
# 抓取 Pushgateway 指标
- job_name: 'pushgateway'
static_configs:
- targets: ['pushgateway:9091']
# 抓取特定节点的 Node Exporter 指标
- job_name: 'node-linux'
static_configs:
- targets: ['10.255.209.40:9100']
# 抓取 Kubernetes API Server 指标
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# 抓取 Kubernetes 节点指标(通过 Node Exporter)
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100' # 将 kubelet 端口替换为 Node Exporter 端口
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# 抓取 Kubernetes Pods 指标
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 抓取 Kubernetes Service Endpoints 指标
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
#scheme: https
#tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
#bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
- job_name: 'kubernetes-nginx-endpoints' # 任务名称
kubernetes_sd_configs:
- role: endpoints # 自动发现 Kubernetes Endpoints
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换抓取协议(http 或 https)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
# 替换指标路径(默认为 /metrics)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# 替换抓取地址和端口
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
# 将 Kubernetes 标签映射到 Prometheus 标签
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
# 添加 Kubernetes Namespace 标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
# 添加 Kubernetes Service 名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 添加 Kubernetes Node 名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node_name
# 如果需要抓取 HTTPS 端点,取消注释以下配置
# scheme: https
# tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
# bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: 'kube-state-metrics'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
- monitoring
- default
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
action: keep
regex: kube-state-metrics
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: http-metrics
metrics_path: /metrics
scheme: http
- job_name: "file_sd"
file_sd_configs:
- files:
- /apps/prometheus/file-sd.yaml
refresh_interval: 1m
- job_name: 'redis'
kubernetes_sd_configs:
- role: endpoints # 从 Kubernetes Endpoints 发现服务
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换目标地址为服务的 IP 和指定端口(9121)
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
action: keep
regex: Pod;(.*redis.*) # 仅抓取名称包含 "redis" 的 Pod
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
replacement: $1:9121 # 指定 Redis Exporter 的端口为 9121
# 添加 Kubernetes 服务的 app 标签
- source_labels: [__meta_kubernetes_service_label_app]
action: replace
target_label: app
# 添加 Kubernetes 命名空间标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 添加 Kubernetes 服务名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 添加 Kubernetes 节点名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
# 添加实例标签(用于区分不同的 Redis 实例)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: instance
- job_name: 'mysql'
kubernetes_sd_configs:
- role: endpoints # 从 Kubernetes Endpoints 发现服务
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换目标地址为服务的 IP 和指定端口(9104)
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
action: keep
regex: Pod;(.*mysql-exporter.*) # 仅抓取名称包含 "mysql-exporter" 的 Pod
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
replacement: $1:9104 # 指定 MySQL Exporter 的端口为 9104
# 添加 Kubernetes 服务的 app 标签
- source_labels: [__meta_kubernetes_service_label_app]
action: replace
target_label: app
# 添加 Kubernetes 命名空间标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 添加 Kubernetes 服务名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 添加 Kubernetes 节点名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
# 添加实例标签(用于区分不同的 MySQL 实例)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: instance
4. 暴露 Prometheus 服务
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
selector:
app: prometheus
应用服务:
kubectl apply -f prometheus-service.yaml
五、验证部署
- 检查 Pod 状态:
- kubectl get pods -l app=prometheus -n default kubectl get pods -n kube-system -l app=node-exporter
- 预期输出:所有 Pod 状态为 Running。
- 访问 Prometheus UI:
通过浏览器访问 http://<NodeIP>:30090,进入 Prometheus 控制台。 - 在 Status > Targets 页面,确认 kubernetes-nodes 和 kubernetes-pods 任务状态为 UP。
- 查询 up{job="kubernetes-nodes"} 验证指标抓取是否正常。
六、常见问题排查
- 权限问题
- 错误示例:Failed to list *v1.Pod: forbidden
- 解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。
- Node Exporter 未启动
- 检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。
- Prometheus 无法抓取指标
- 检查 Prometheus 配置中的 scrape_configs 是否指向正确的端口(如 Node Exporter 默认端口为 9100)。
- 验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics。
七、后续优化
- 配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。
- 持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。
- 监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。
相关推荐
- 【推荐】一个开源免费、AI 驱动的智能数据管理系统,支持多数据库
-
如果您对源码&技术感兴趣,请点赞+收藏+转发+关注,大家的支持是我分享最大的动力!!!.前言在当今数据驱动的时代,高效、智能地管理数据已成为企业和个人不可或缺的能力。为了满足这一需求,我们推出了这款开...
- Pure Storage推出统一数据管理云平台及新闪存阵列
-
PureStorage公司今日推出企业数据云(EnterpriseDataCloud),称其为组织在混合环境中存储、管理和使用数据方式的全面架构升级。该公司表示,EDC使组织能够在本地、云端和混...
- 对Java学习的10条建议(对java课程的建议)
-
不少Java的初学者一开始都是信心满满准备迎接挑战,但是经过一段时间的学习之后,多少都会碰到各种挫败,以下北风网就总结一些对于初学者非常有用的建议,希望能够给他们解决现实中的问题。Java编程的准备:...
- SQLShift 重大更新:Oracle→PostgreSQL 存储过程转换功能上线!
-
官网:https://sqlshift.cn/6月,SQLShift迎来重大版本更新!作为国内首个支持Oracle->OceanBase存储过程智能转换的工具,SQLShift在过去一...
- JDK21有没有什么稳定、简单又强势的特性?
-
佳未阿里云开发者2025年03月05日08:30浙江阿里妹导读这篇文章主要介绍了Java虚拟线程的发展及其在AJDK中的实现和优化。阅前声明:本文介绍的内容基于AJDK21.0.5[1]以及以上...
- 「松勤软件测试」网站总出现404 bug?总结8个原因,不信解决不了
-
在进行网站测试的时候,有没有碰到过网站崩溃,打不开,出现404错误等各种现象,如果你碰到了,那么恭喜你,你的网站出问题了,是什么原因导致网站出问题呢,根据松勤软件测试的总结如下:01数据库中的表空间不...
- Java面试题及答案最全总结(2025版)
-
大家好,我是Java面试陪考员最近很多小伙伴在忙着找工作,给大家整理了一份非常全面的Java面试题及答案。涉及的内容非常全面,包含:Spring、MySQL、JVM、Redis、Linux、Sprin...
- 数据库日常运维工作内容(数据库日常运维 工作内容)
-
#数据库日常运维工作包括哪些内容?#数据库日常运维工作是一个涵盖多个层面的综合性任务,以下是详细的分类和内容说明:一、数据库运维核心工作监控与告警性能监控:实时监控CPU、内存、I/O、连接数、锁等待...
- 分布式之系统底层原理(上)(底层分布式技术)
-
作者:allanpan,腾讯IEG高级后台工程师导言分布式事务是分布式系统必不可少的组成部分,基本上只要实现一个分布式系统就逃不开对分布式事务的支持。本文从分布式事务这个概念切入,尝试对分布式事务...
- oracle 死锁了怎么办?kill 进程 直接上干货
-
1、查看死锁是否存在selectusername,lockwait,status,machine,programfromv$sessionwheresidin(selectsession...
- SpringBoot 各种分页查询方式详解(全网最全)
-
一、分页查询基础概念与原理1.1什么是分页查询分页查询是指将大量数据分割成多个小块(页)进行展示的技术,它是现代Web应用中必不可少的功能。想象一下你去图书馆找书,如果所有书都堆在一张桌子上,你很难...
- 《战场兄弟》全事件攻略 一般事件合同事件红装及隐藏职业攻略
-
《战场兄弟》全事件攻略,一般事件合同事件红装及隐藏职业攻略。《战场兄弟》事件奖励,事件条件。《战场兄弟》是OverhypeStudios制作发行的一款由xcom和桌游为灵感来源,以中世纪、低魔奇幻为...
- LoadRunner(loadrunner录制不到脚本)
-
一、核心组件与工作流程LoadRunner性能测试工具-并发测试-正版软件下载-使用教程-价格-官方代理商的架构围绕三大核心组件构建,形成完整测试闭环:VirtualUserGenerator(...
- Redis数据类型介绍(redis 数据类型)
-
介绍Redis支持五种数据类型:String(字符串),Hash(哈希),List(列表),Set(集合)及Zset(sortedset:有序集合)。1、字符串类型概述1.1、数据类型Redis支持...
- RMAN备份监控及优化总结(rman备份原理)
-
今天主要介绍一下如何对RMAN备份监控及优化,这里就不讲rman备份的一些原理了,仅供参考。一、监控RMAN备份1、确定备份源与备份设备的最大速度从磁盘读的速度和磁带写的带度、备份的速度不可能超出这两...
你 发表评论:
欢迎- 一周热门
- 最近发表
- 标签列表
-
- oracle位图索引 (63)
- oracle批量插入数据 (62)
- oracle事务隔离级别 (53)
- oracle 空为0 (50)
- oracle主从同步 (55)
- oracle 乐观锁 (51)
- redis 命令 (78)
- php redis (88)
- redis 存储 (66)
- redis 锁 (69)
- 启动 redis (66)
- redis 时间 (56)
- redis 删除 (67)
- redis内存 (57)
- redis并发 (52)
- redis 主从 (69)
- redis 订阅 (51)
- redis 登录 (54)
- redis 面试 (58)
- 阿里 redis (59)
- redis 搭建 (53)
- redis的缓存 (55)
- lua redis (58)
- redis 连接池 (61)
- redis 限流 (51)