微服务监控 - 监控自己的服务

客户端库
指标类型
Demo
- 提供 metrics 接口
- 添加自己的指标
小结

上一篇讲解了使用 Exporter 监控 Kubernetes 集群应用。本篇主要向大家介绍如何监控自己的服务。

要想自己的服务能够被监控，必须要将服务运行中的各项目指标暴露出来，提供给 Prometheus 采集信息。我们可以使用 Prometheus 提供的客户端库暴露自身的运行时信息。

客户端库

Prometheus 官方提供了 Go、Java or Scala、Python、Ruby 的客户端库。其他大部分语言，第三方也提供了相应的支持，详见客户端库文档。

在讲述如何使用客户端在服务中暴露指标前，让我们先来了解一下 Prometheus 库提供的各种指标类型。

指标类型

Prometheus 客户端库提供了四种核心指标类型。

Counter（计数器）

一个计数器是代表一个累积指标单调递增计数器，它的值只会增加或在重启时重置为零。例如，您可以使用计数器来表示服务过的请求数、已完成任务数或错误次数。

注：不要使用计数器来暴露可以减小的值。例如，请勿对当前正在运行的进程数使用计数器；而是使用计量器。

Gauge（计量器）

gauge 是代表一个数值类型的指标，它的值可以增或减。gauge 通常用于一些度量的值例如温度或是当前内存使用，也可以用于一些可以增减的“计数”，如正在运行的 Goroutine 个数。

Histogram（直方图）

histogram 对观测值（类似请求延迟或回复包大小）进行采样，并用一些可配置的 buckets 来计数。它也会给出一个所有观测值的总和。

基本指标名称为的 histogram，在指标抓取期间会暴露多个时间序列：

观测 buckets 的累积计数器，暴露为 <basename>_bucket{le="<upper inclusive bound>"}
所有观察值的总和，暴露为 <basename>_sum
已观察到的事件的计数，暴露为 <basename>_count（等同于上文的 <basename>_bucket{le="+Inf"}）

使用 histogram_quantile() 方法可以根据直方图甚至是直方图的聚合来计算分位数。直方图也适用于计算 Apdex 得分。在 buckets 上操作时，请记住直方图是累积的。有关直方图用法的详细信息以及与摘要的差异，请参见直方图和摘要。

Summary（摘要）

跟 histogram 类似，summary 也对观测值（类似请求延迟或回复包大小）进行采样。同时它会给出一个总数以及所有观测值的总和，它在一个滑动的时间窗口上计算可配置的分位数。

基本度量标准名称为的摘要会在指标抓取期间暴露多个时间序列：

streaming φ-位数（0≤φ≤1）观察到的事件，暴露为 <basename>{quantile="<φ>"}
所有观察值的总和，暴露为 <basename>_sum
已经被观察到的事件总数，暴露为 <basename>_count

有关 φ 分位数的详细说明，摘要用法以及与直方图的差异，请参见直方图和摘要。

Demo

下面以 Go 为例，讲解下如何使用 Prometheus 客户端监控自己的服务。

提供 metrics 接口

在服务中集成 Prometheus 的第一步就是提供 /metrics 接口。服务应该监听一个只在基础设施内可用的内部端口，通常是在 9xxx 范围内。Prometheus 团队维护一个默认端口分配的列表，选择端口时可以参考。

以下代码，创建了一个新 HTTP 服务（demo1），通过 http://localhost:9001/metrics 暴露了 Prometheus Golang 应用的默认指标。

// demo1.go
package main

import (
	"net/http"

	"github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
	http.Handle("/metrics", promhttp.Handler())
	http.ListenAndServe(":9001", nil)
}

启动服务

go run demo1.go

查看指标

❯ curl http://localhost:9001/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.1"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.499288e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.499288e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.443808e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 151
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.240512e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.499288e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.4118784e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.531328e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2806
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.4118784e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6650112e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2957
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13888
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 23936
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32768
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.050904e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 458752
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 458752
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.189324e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 9
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

添加自己的指标

demo1 只暴露了默认的指标。下面，我们添加一个名为 myapp_processed_ops_total 的计数器指标。该计数器对到目前为止已处理的操作数进行计数。每 2 秒，计数器将增加 1。

// demo2.go
package main

import (
	"net/http"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

func recordMetrics() {
	go func() {
		for {
			opsProcessed.Inc()
			time.Sleep(2 * time.Second)
		}
	}()
}

var (
	opsProcessed = promauto.NewCounter(prometheus.CounterOpts{
		Name: "myapp_processed_ops_total",
		Help: "The total number of processed events",
	})
)

func main() {
	recordMetrics()

	http.Handle("/metrics", promhttp.Handler())
	http.ListenAndServe(":9001", nil)
}