微服务监控 - 监控自己的服务

上一篇 讲解了使用 Exporter 监控 Kubernetes 集群应用。本篇主要向大家介绍如何监控自己的服务。

要想自己的服务能够被监控,必须要将服务运行中的各项目指标暴露出来,提供给 Prometheus 采集信息。我们可以使用 Prometheus 提供的客户端库暴露自身的运行时信息。

客户端库

Prometheus 官方提供了 GoJava or ScalaPythonRuby 的客户端库。其他大部分语言,第三方也提供了相应的支持,详见客户端库文档

在讲述如何使用客户端在服务中暴露指标前,让我们先来了解一下 Prometheus 库提供的各种指标类型。

指标类型

Prometheus 客户端库提供了四种核心指标类型

Counter(计数器)

一个计数器是代表一个累积指标单调递增计数器,它的值只会增加或在重启时重置为零。例如,您可以使用计数器来表示服务过的请求数、已完成任务数或错误次数

注:不要使用计数器来暴露可以减小的值。例如,请勿对当前正在运行的进程数使用计数器;而是使用计量器。

Gauge(计量器)

gauge 是代表一个数值类型的指标,它的值可以增或减。gauge 通常用于一些度量的值例如温度或是当前内存使用,也可以用于一些可以增减的“计数”,如正在运行的 Goroutine 个数。

Histogram(直方图)

histogram 对观测值(类似请求延迟或回复包大小)进行采样,并用一些可配置的 buckets计数。它也会给出一个所有观测值的总和

基本指标名称为 的 histogram,在指标抓取期间会暴露多个时间序列:

  • 观测 buckets 的累积计数器,暴露为 <basename>_bucket{le="<upper inclusive bound>"}
  • 所有观察值的总和,暴露为 <basename>_sum
  • 已观察到的事件的计数,暴露为 <basename>_count(等同于上文的 <basename>_bucket{le="+Inf"}

使用 histogram_quantile() 方法可以根据直方图甚至是直方图的聚合来计算分位数。直方图也适用于计算 Apdex 得分。在 buckets 上操作时,请记住直方图是累积的。有关直方图用法的详细信息以及与摘要的差异,请参见直方图和摘要

Summary(摘要)

跟 histogram 类似,summary 也对观测值(类似请求延迟或回复包大小)进行采样。同时它会给出一个总数以及所有观测值的总和,它在一个滑动的时间窗口上计算可配置的分位数。

基本度量标准名称为 的摘要会在指标抓取期间暴露多个时间序列:

  • streaming φ-位数(0≤φ≤1)观察到的事件,暴露为 <basename>{quantile="<φ>"}
  • 所有观察值的总和,暴露为 <basename>_sum
  • 已经被观察到的事件总数,暴露为 <basename>_count

有关 φ 分位数的详细说明,摘要用法以及与直方图的差异,请参见直方图和摘要

Demo

下面以 Go 为例,讲解下如何使用 Prometheus 客户端监控自己的服务。

提供 metrics 接口

在服务中集成 Prometheus 的第一步就是提供 /metrics 接口。服务应该监听一个只在基础设施内可用的内部端口,通常是在 9xxx 范围内。Prometheus 团队维护一个默认端口分配的列表,选择端口时可以参考。

以下代码,创建了一个新 HTTP 服务(demo1),通过 http://localhost:9001/metrics 暴露了 Prometheus Golang 应用的默认指标

1
2
3
4
5
6
7
8
9
10
11
12
13
// demo1.go
package main

import (
	"net/http"

	"github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
	http.Handle("/metrics", promhttp.Handler())
	http.ListenAndServe(":9001", nil)
}

启动服务

1
go run demo1.go

查看指标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
❯ curl http://localhost:9001/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.1"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.499288e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.499288e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.443808e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 151
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.240512e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.499288e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.4118784e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.531328e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2806
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.4118784e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6650112e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2957
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13888
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 23936
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32768
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.050904e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 458752
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 458752
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.189324e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 9
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

添加自己的指标

demo1 只暴露了默认的指标。下面,我们添加一个名为 myapp_processed_ops_total计数器指标。该计数器对到目前为止已处理的操作数进行计数。每 2 秒,计数器将增加 1。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// demo2.go
package main

import (
	"net/http"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

func recordMetrics() {
	go func() {
		for {
			opsProcessed.Inc()
			time.Sleep(2 * time.Second)
		}
	}()
}

var (
	opsProcessed = promauto.NewCounter(prometheus.CounterOpts{
		Name: "myapp_processed_ops_total",
		Help: "The total number of processed events",
	})
)

func main() {
	recordMetrics()

	http.Handle("/metrics", promhttp.Handler())
	http.ListenAndServe(":9001", nil)
}

启动服务

1
go run demo2.go

查看指标

1
2
3
4
5
6
7
8
9
❯ curl http://localhost:9001/metrics
...
# HELP myapp_processed_ops_total The total number of processed events
# TYPE myapp_processed_ops_total counter
myapp_processed_ops_total 5
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
...

多次查看,可以看到指标 myapp_processed_ops_total 值一直在增加。

1
2
3
4
5
6
7
8
9
❯ curl http://localhost:9001/metrics
...
# HELP myapp_processed_ops_total The total number of processed events
# TYPE myapp_processed_ops_total counter
myapp_processed_ops_total 26
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
...

小结

本篇以计数器为例为大家介绍了如何向自己的服务添加指标。你还可以暴露其他指标类型,详见用法参见 client_golang

下一篇将为大家带来,Grafana 使用教程

注:本章内容涉及的 yaml 文件可前往 https://github.com/MakeOptim/service-mesh/prometheus 获取。


CatchZeng
Written by CatchZeng Follow
AI (Machine Learning) and DevOps enthusiast.