上一篇 讲解了使用 Exporter 监控 Kubernetes 集群应用。本篇主要向大家介绍如何监控自己的服务。
要想自己的服务能够被监控,必须要将服务运行中的各项目指标暴露出来,提供给 Prometheus 采集信息。我们可以使用 Prometheus 提供的客户端库暴露自身的运行时信息。
客户端库
Prometheus 官方提供了 Go
、Java or Scala
、Python
、Ruby
的客户端库。其他大部分语言,第三方也提供了相应的支持,详见客户端库文档。
在讲述如何使用客户端在服务中暴露指标前,让我们先来了解一下 Prometheus 库提供的各种指标类型。
指标类型
Prometheus 客户端库提供了四种核心指标类型。
Counter(计数器)
一个计数器是代表一个累积指标单调递增计数器,它的值只会增加或在重启时重置为零。例如,您可以使用计数器来表示服务过的请求数、已完成任务数或错误次数。
注:不要使用计数器来暴露可以减小的值。例如,请勿对当前正在运行的进程数使用计数器;而是使用计量器。
Gauge(计量器)
gauge 是代表一个数值类型的指标,它的值可以增或减。gauge 通常用于一些度量的值例如温度或是当前内存使用,也可以用于一些可以增减的“计数”,如正在运行的 Goroutine 个数。
Histogram(直方图)
histogram 对观测值(类似请求延迟或回复包大小)进行采样,并用一些可配置的 buckets
来计数。它也会给出一个所有观测值的总和。
基本指标名称为
- 观测 buckets 的累积计数器,暴露为
<basename>_bucket{le="<upper inclusive bound>"}
- 所有观察值的总和,暴露为
<basename>_sum
- 已观察到的事件的计数,暴露为
<basename>_count
(等同于上文的<basename>_bucket{le="+Inf"}
)
使用 histogram_quantile()
方法可以根据直方图甚至是直方图的聚合来计算分位数。直方图也适用于计算 Apdex 得分。在 buckets 上操作时,请记住直方图是累积的。有关直方图用法的详细信息以及与摘要的差异,请参见直方图和摘要。
Summary(摘要)
跟 histogram 类似,summary 也对观测值(类似请求延迟或回复包大小)进行采样。同时它会给出一个总数以及所有观测值的总和,它在一个滑动的时间窗口上计算可配置的分位数。
基本度量标准名称为
- streaming φ-位数(0≤φ≤1)观察到的事件,暴露为
<basename>{quantile="<φ>"}
- 所有观察值的总和,暴露为
<basename>_sum
- 已经被观察到的事件总数,暴露为
<basename>_count
有关 φ 分位数的详细说明,摘要用法以及与直方图的差异,请参见直方图和摘要。
Demo
下面以 Go 为例,讲解下如何使用 Prometheus 客户端监控自己的服务。
提供 metrics 接口
在服务中集成 Prometheus 的第一步就是提供 /metrics
接口。服务应该监听一个只在基础设施内可用的内部端口,通常是在 9xxx
范围内。Prometheus 团队维护一个默认端口分配的列表,选择端口时可以参考。
以下代码,创建了一个新 HTTP 服务(demo1),通过 http://localhost:9001/metrics
暴露了 Prometheus Golang 应用的默认指标。
1
2
3
4
5
6
7
8
9
10
11
12
13
// demo1.go
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":9001", nil)
}
启动服务
1
go run demo1.go
查看指标
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
❯ curl http://localhost:9001/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.1"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.499288e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.499288e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.443808e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 151
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.240512e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.499288e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.4118784e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.531328e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2806
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.4118784e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6650112e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2957
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13888
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 23936
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32768
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.050904e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 458752
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 458752
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.189324e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 9
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
添加自己的指标
demo1 只暴露了默认的指标。下面,我们添加一个名为 myapp_processed_ops_total
的计数器指标。该计数器对到目前为止已处理的操作数进行计数。每 2 秒,计数器将增加 1。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// demo2.go
package main
import (
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
func recordMetrics() {
go func() {
for {
opsProcessed.Inc()
time.Sleep(2 * time.Second)
}
}()
}
var (
opsProcessed = promauto.NewCounter(prometheus.CounterOpts{
Name: "myapp_processed_ops_total",
Help: "The total number of processed events",
})
)
func main() {
recordMetrics()
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":9001", nil)
}
启动服务
1
go run demo2.go
查看指标
1
2
3
4
5
6
7
8
9
❯ curl http://localhost:9001/metrics
...
# HELP myapp_processed_ops_total The total number of processed events
# TYPE myapp_processed_ops_total counter
myapp_processed_ops_total 5
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
...
多次查看,可以看到指标 myapp_processed_ops_total
值一直在增加。
1
2
3
4
5
6
7
8
9
❯ curl http://localhost:9001/metrics
...
# HELP myapp_processed_ops_total The total number of processed events
# TYPE myapp_processed_ops_total counter
myapp_processed_ops_total 26
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
...
小结
本篇以计数器为例为大家介绍了如何向自己的服务添加指标。你还可以暴露其他指标类型,详见用法参见 client_golang。
下一篇将为大家带来,Grafana 使用教程。
注:本章内容涉及的 yaml 文件可前往 https://github.com/MakeOptim/service-mesh/prometheus 获取。