Go 性能分析工具 PProf（上）

1、PProf 是什么

在 Go 语言中，PProf 是用于可视化和分析性能分析数据的工具，PProf 以 profile.proto 读取分析样本的集合，并生成报告以可视化并帮助分析数据（支持文本和图形报告）。

主要有以下几点功能：

CPU Profiling：CPU 分析，按照一定的频率采集所监听的应用程序。
Memory Profiling：内存分析，在应用程序进行堆分配时记录堆栈跟踪，用于监视当前和历史内存使用情况，以及检查内存泄漏。
Block Profiling：阻塞分析，记录 Goroutine 阻塞等待同步（包括定时器通道）的位置，默认不开启，需要调用 runtime.SetBlockProfileRate 进行设置。
Mutex Profiling：互斥锁分析，报告互斥锁的竞争情况，默认不开启，需要调用 runtime.SetMutexProfileFraction 进行设置。
Goroutine Profiling： Goroutine 分析，可以对当前应用程序正在运行的 Goroutine 进行堆栈跟踪和分析。这项功能在实际排查中会经常用到，因为很多问题出现时的表象就是 Goroutine 暴增，而这时候我们要做的事情之一就是查看应用程序中的 Goroutine 正在做什么事情，因为什么阻塞了，然后再进行下一步。

2、从一个例子开始

先新建一个 demohttp.go 文件，用于后续的应用程序分析和示例展示，代码如下：

package main

import (
    "bytes"
    "io/ioutil"
    "log"
    "math/rand"
    "net/http"
    _ "net/http/pprof"
    "sync"
)

func main() {
    http.HandleFunc("/pprof-test", handler)
    // pprof 服务器，将暴露在 8090 端口
    if err := http.ListenAndServe(":8090", nil); err != nil {
        log.Fatal(err)
    }
}

func handler(resp http.ResponseWriter, req *http.Request) {
    var wg sync.WaitGroup
    wg.Add(200)
    for i := 0; i < 200; i++ {
        go cyclenum(30000, &wg)
    }
    wg.Wait()
    wb := writeBytes()
    b, err := ioutil.ReadAll(wb)
    if err != nil {
        resp.Write([]byte(err.Error()))
        return
    }
    resp.Write(b)
}

func cyclenum(num int, wg *sync.WaitGroup) {
    slice := make([]int, 0)
    for i := 0; i < num; i++ {
        for j := 0; j < num; j++ {
            j = i + j
            slice = append(slice, j)
        }
    }
    wg.Done()
}

func writeBytes() *bytes.Buffer {
    var buff bytes.Buffer
    for i := 0; i < 30000; i++ {
        buff.Write([]byte{'a' + byte(rand.Intn(10))})
    }
    return &buff
}

这段代码的主要逻辑就是起点一个 http 接口进行一些耗时操作。其中重要的一步，就是在 import 中添加 _ "net/http/pprof" 的引用。

需要注意在实际项目中，pprof 服务需要使用独立的协程运行，否则会阻塞代码运行。例如：

func main() {
    go func() {
        // pprof 服务器，将暴露在 6060 端口
        if err := http.ListenAndServe(":6060", nil); err != nil {
            panic(err)
        }
    }()
}

3、开始分析

先运行上面的 demohttp.go 程序，执行命令：

go run demohttp.go

在 web 界面上分析，浏览器输入：http://localhost:8090/debug/pprof/，在这里可以查看服务的整体概览运行情况，如下图：

名称	说明
allocs	过去所有内存抽样情况
block	同步阻塞时程序栈跟踪的一些情况
heap	活动对象的内存分配情况
mutex	互斥锁持有者的栈帧情况
profile	默认进行 30s 的 CPU Profiling，得到一个分析用的 profile 文件，然后可以用 `go tool pprof` 命令进行分析
threadcreate	创建新 OS 线程的堆栈跟踪情况
trace	当前程序执行的追踪情况，点击时会得到一个文件，可以用 `go tool trace` 命令来分析这个文件

不断刷新网页，可以看到数据在不断变化。

3.1、CPU Profiling

再开启另外一个命令行终端，执行如下命令：

go tool pprof "http://localhost:8090/debug/pprof/profile?seconds=30"

执行该命令后，需等待 30 秒（可调整参数 seconds 的值）pprof 会进行 CPU Profiling，结束后将默认进入 pprof 的命令行交互式模式，可以对分析的结果进行查看或导出。

等待 30s，进入命令交互界面，接下来我们输入查询命令 top，以此查看对应资源开销（例如：CPU 就是执行耗时/开销、Memory 就是内存占用大小）排名前十的函数，如下

$ go tool pprof "http://localhost:8090/debug/pprof/profile?seconds=30"
Fetching profile over HTTP from http://localhost:8090/debug/pprof/profile?seconds=30
Saved profile in /Users/pprof/pprof.demohttp.samples.cpu.001.pb.gz
File: demohttp
Type: cpu
Time: Dec 7, 2024 at 4:32pm (CST)
Duration: 30.09s, Total samples = 30ms (  0.1%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 30ms, 100% of 30ms total
Showing top 10 nodes out of 12
      flat  flat%   sum%        cum   cum%
      20ms 66.67% 66.67%       20ms 66.67%  runtime.kevent
      10ms 33.33%   100%       10ms 33.33%  runtime.pthread_cond_signal
         0     0%   100%       20ms 66.67%  runtime.findRunnable
         0     0%   100%       30ms   100%  runtime.mcall
         0     0%   100%       20ms 66.67%  runtime.netpoll
         0     0%   100%       10ms 33.33%  runtime.notewakeup
         0     0%   100%       30ms   100%  runtime.park_m
         0     0%   100%       10ms 33.33%  runtime.resetspinning
         0     0%   100%       30ms   100%  runtime.schedule
         0     0%   100%       10ms 33.33%  runtime.semawakeup

名称	说明
flat	函数自身的运行耗时
flat%	函数自身在 CPU 运行耗时总比例
sum%	函数自身累积使用 CPU 总比例
cum	函数自身及其调用函数的运行总耗时
cum%	函数自身及其调用函数的运行耗时总比例
name	函数名

在大多数的情况下，我们可以通过这五列得出一个应用程序的运行情况，知道当前是什么函数，正在做什么事情，占用了多少资源，谁又是占用资源的大头，以此来得到一个初步的分析方向。

3.2、Heap Profiling

go tool pprof "http://localhost:8090/debug/pprof/heap"

执行该命令后，能够很快的拉取到堆的结果，因为它不需要像 CPU Profiling 做采样等待

$ go tool pprof "http://localhost:8090/debug/pprof/heap"
Fetching profile over HTTP from http://localhost:8090/debug/pprof/heap
Saved profile in /Users/pprof/pprof.demohttp.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: demohttp
Type: inuse_space
Time: Dec 7, 2024 at 5:02pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 512.09kB, 100% of 512.09kB total
      flat  flat%   sum%        cum   cum%
  512.09kB   100%   100%   512.09kB   100%  text/tabwriter.NewWriter (inline)
         0     0%   100%   512.09kB   100%  net/http.(*ServeMux).ServeHTTP
         0     0%   100%   512.09kB   100%  net/http.(*conn).serve
         0     0%   100%   512.09kB   100%  net/http.HandlerFunc.ServeHTTP
         0     0%   100%   512.09kB   100%  net/http.serverHandler.ServeHTTP
         0     0%   100%   512.09kB   100%  net/http/pprof.Index
         0     0%   100%   512.09kB   100%  net/http/pprof.handler.ServeHTTP
         0     0%   100%   512.09kB   100%  runtime/pprof.(*Profile).WriteTo
         0     0%   100%   512.09kB   100%  runtime/pprof.writeAlloc
         0     0%   100%   512.09kB   100%  runtime/pprof.writeHeapInternal

3.3、Goroutine Profiling

$ go tool pprof http://localhost:8090/debug/pprof/goroutine
Fetching profile over HTTP from http://localhost:6060/debug/pprof/goroutine
Saved profile in /Users/eddycjy/pprof/pprof.goroutine.003.pb.gz
Type: goroutine
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

在查看 goroutine 时，我们可以使用 traces 命令，这个命令会打印出对应的所有调用栈，以及指标信息，可以让我们很便捷的查看到整个调用链路有什么，分别在哪里使用了多少个 goroutine，并且能够通过分析查看到谁才是真正的调用方，输出结果如下：

(pprof) traces
File: demohttp
Type: goroutine
Time: Dec 8, 2024 at 10:03pm (CST)
-----------+-------------------------------------------------------
         1   runtime.goroutineProfileWithLabels
             ...
-----------+-------------------------------------------------------
         1   runtime.gopark
             runtime.netpollblock
             ...
-----------+-------------------------------------------------------
         1   runtime.gopark
             runtime.netpollblock
             internal/poll.runtime_pollWait
             internal/poll.(*pollDesc).wait
             internal/poll.(*pollDesc).waitRead (inline)
             internal/poll.(*FD).Accept
             net.(*netFD).accept
             net.(*TCPListener).accept
             net.(*TCPListener).Accept
             net/http.(*Server).Serve
             net/http.(*Server).ListenAndServe
             net/http.ListenAndServe (inline)
             main.main
             runtime.main
-----------+-------------------------------------------------------

在调用栈上来讲，其展示顺序是自下而上的，也就是 runtime.main 方法调用了 main.main 方法， main.main 方法又调用了 net/http.ListenAndServe 方法，这里对应的也就是我们所使用的示例代码了，排查起来会非常方便。

每个调用堆栈信息用 ----------- 分割，函数方法前的就是指标数据，像 Goroutine Profiling 展示是就是该方法占用的 goroutine 的数量。而 Heap Profiling 展示的就是占用的内存大小，如下：

$ go tool pprof http://localhost:8090/debug/pprof/heap
...
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) traces
Type: inuse_space
-----------+-------------------------------------------------------
     bytes:  13.55MB
   13.55MB   main.Add
             main.main.func1
-----------+-------------------------------------------------------

pprof 中的所有功能都会根据不同的 Profile 类型展示不同的对应结果。

4、查看可视化界面

上文提到的都是在命令行中查看分析程序性能，现需要进行可视化分析。

主要分 2 步：

发送请求，采集程序
使用 Graphviz 进行图像化分析

具体操作如下。

执行命令 wget http://127.0.0.1:8090/debug/pprof/profile，发送 HTTP GET 请求，下载文件。

$ wget http://127.0.0.1:8090/debug/pprof/profile
正在连接 127.0.0.1:8090... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：957 [application/octet-stream]
正在保存至: “profile”

默认需要等待 30 秒，执行完毕后可在当前目录下生成采集的文件 profile，然后进行下一步分析。

但在图形化分析前，先要安装 Graphviz 软件。

Graphviz 是 Graph Visualization 的缩写，是一种将结构信息表示为抽象图形和网络图的方法。自动图形绘制在软件工程、数据库和网页设计、网络以及许多其他领域的可视化界面中有许多重要应用。

安装完成后，执行命令：

go tool pprof -http=:6001 profile

这里的 profile 就是上一步生成的文件。该命令将在所指定的端口号运行一个 PProf 的分析用的站点。

在 View 菜单栏中，PProf 还支持多种分析方式的切换，例如 top 查看：

但如果直接采集，会发现没什么有价值的信息，原因是没有用户访问 http server ，真正需要的程序没有执行，一直阻塞在那里等待客户端的访问连接，所以 go tool pprof 只能采集部分代码运行的信息，而这部分代码又没有消耗多少 cpu。

一个方法就是用 http 测试工具模拟用户访问。这里用 hey 这个工具。安装 hey：

go get -u github.com/rakyll/hey

安装完成后，进行 http 访问：

hey -n 1000 http://localhost:8090/pprof-test

同时开启另一终端再次执行命令：

wget http://127.0.0.1:8090/debug/pprof/profile

在访问 http 接口的同时进行采集，这样才能采集到执行的代码所消耗的 CPU。

4.1、Graph

采集完成后，再次执行

go tool pprof -http=:6001 profile

对采集的文件进行图形化分析，通过 PProf 所提供的可视化界面，我们能够更方便、更直观的看到 Go 应用程序的调用链、使用情况等。

节点颜色与 cum 值有关：正值大的为红色；负值大的为绿色（负值通常在 profile 对比时出现）；值接近 0 的为灰色。
节点的字号与 flat 的绝对值有关：值越大则字号越大
边的粗细与该路径下的资源使用有关：资源使用越多则线条越粗
边的颜色与节点颜色类似
边的形状：虚线：两个节点之间的部分节点被移除了（间接调用）；实现：两个节点之间存在直接调用关系

简单来说：该视图展示的为整体的函数调用流程，框越大、线越粗、框颜色越鲜艳（红色）就代表它占用的时间越久，开销越大。相反若框颜色越淡，越小则代表在整体的函数调用流程中，它的开销是相对较小的。

因此我们可以用此视图去分析谁才是开销大头，它又是因为什么调用流程而被调用的。

在这个例中，可以看到是 main 包中的 cyclenum 方法又大又红，对应到上面的示例代码，确实是该方法最耗时，因此分析的重点就是里了。

4.2、heap

另外除了对 CPU 进行剖析以外，我们还可以调整选项，对内存情况进行分析，如下采集命令：

wget http://127.0.0.1:8090/debug/pprof/heap

采集成功后，会在当前目录下生成 heap 文件，再使用图形界面查看

go tool pprof -http=:6001 heap

4.2、火焰图

在性能分析过程中，当发现调用关系图因调用栈过深而显得杂乱时，可利用火焰图的以下特性快速定位问题，举个例子：

当查看火焰图：

可以快速定位是日志打印，以及数据转换耗费了大量 CPU，因此可以直接定位到代码解决问题。

Flame Graph（火焰图）它是可动态的，调用顺序由上到下，每一块代表一个函数、颜色越鲜艳（红）、区块越大代表占用 CPU 的时间更长。同时它也支持点击块深入进行分析。

节点的颜色是由它的包名决定的，相同包名的节点将拥有相同的颜色
节点的字号可能会有区别，但是与上面的图形不同的是，此处的字号仅为适应其节点大小，并无其它含义
上方的节点为调用者，下方的节点为被调用者
节点的宽度表示资源使用情况，越宽则资源使用越多，其总宽度代表 cum，去除其子节点后，剩余的宽度代表 flat
如果上下节点之间没有边框，则表示这两个函数被“内联”了。

5、总结

To use pprof, link this package into your program:

import _ "net/http/pprof"

If your application is not already running an http server, you need to start one. Add "net/http" and "log" to your imports and the following code to your main function:

go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

Parameters can be passed via GET query params:

debug=N (all profiles): response format: N = 0: binary (default), N > 0: plaintext
gc=N (heap profile): N > 0: run a garbage collection cycle before profiling
seconds=N (allocs, block, goroutine, heap, mutex, threadcreate profiles): return a delta profile
seconds=N (cpu (profile), trace profiles): profile for the given duration

Use the pprof tool to look at the heap profile:

go tool pprof http:localhost:6060/debug/pprof/heap

Or to look at a 30-second CPU profile:

go tool pprof http:localhost:6060/debug/pprof/profile?seconds=30

Or to look at the goroutine blocking profile, after calling [runtime.SetBlockProfileRate] in your program:

go tool pprof http:localhost:6060/debug/pprof/block

Or to look at the holders of contended mutexes, after calling [runtime.SetMutexProfileFraction] in your program:

go tool pprof http:localhost:6060/debug/pprof/mutex

The package also exports a handler that serves execution trace data for the "go tool trace" command. To collect a 5-second execution trace:

curl -o trace.out http:localhost:6060/debug/pprof/trace?seconds=5
go tool trace trace.out

To view all available profiles, open http:localhost:6060/debug/pprof/ in your browser.

本文介绍了 CPU、内存、火焰图的使用，关于另一个重点内容 trace，将放在下一集 Go 性能分析工具 PProf（下）进行介绍。