The sb-sprof
module, loadable by
(require :sb-sprof)
provides an alternate profiler which works by taking samples of the
program execution at regular intervals, instead of instrumenting
functions like sb-profile:profile
does. You might find
sb-sprof
more useful than the deterministic profiler when profiling
functions in the common-lisp
-package, SBCL internals, or code
where the instrumenting overhead is excessive.
Additionally sb-sprof
includes a limited deterministic profiler
which can be used for reporting the amounts of calls to some functions
during
(in-package :cl-user) (require :sb-sprof) (declaim (optimize speed)) (defun cpu-test-inner (a i) (logxor a (* i 5) (+ a i))) (defun cpu-test (n) (let ((a 0)) (dotimes (i (expt 2 n) a) (setf a (cpu-test-inner a i))))) ;;;; CPU profiling ;;; Take up to 1000 samples of running (CPU-TEST 26), and give a flat ;;; table report at the end. Profiling will end one the body has been ;;; evaluated once, whether or not 1000 samples have been taken. (sb-sprof:with-profiling (:max-samples 1000 :report :flat :loop nil) (cpu-test 26)) ;;; Record call counts for functions defined on symbols in the CL-USER ;;; package. (sb-sprof:profile-call-counts "CL-USER") ;;; Take 1000 samples of running (CPU-TEST 24), and give a flat ;;; table report at the end. The body will be re-evaluated in a loop ;;; until 1000 samples have been taken. A sample count will be printed ;;; after each iteration. (sb-sprof:with-profiling (:max-samples 1000 :report :flat :loop t :show-progress t) (cpu-test 24)) ;;;; Allocation profiling (defun foo (&rest args) (mapcar (lambda (x) (float x 1d0)) args)) (defun bar (n) (declare (fixnum n)) (apply #'foo (loop repeat n collect n))) (sb-sprof:with-profiling (:max-samples 10000 :mode :alloc :report :flat) (bar 1000))
The flat report format will show a table of all functions that the profiler encountered on the call stack during sampling, ordered by the number of samples taken while executing that function.
Self Total Cumul Nr Count % Count % Count % Calls Function ------------------------------------------------------------------------ 1 69 24.4 97 34.3 69 24.4 67108864 CPU-TEST-INNER 2 64 22.6 64 22.6 133 47.0 - SB-VM::GENERIC-+ 3 39 13.8 256 90.5 172 60.8 1 CPU-TEST 4 31 11.0 31 11.0 203 71.7 - SB-KERNEL:TWO-ARG-XOR
For each function, the table will show three absolute and relative sample counts. The Self column shows samples taken while directly executing that function. The Total column shows samples taken while executing that function or functions called from it (sampled to a platform-specific depth). The Cumul column shows the sum of all Self columns up to and including that line in the table.
Additionally the Calls column will record the amount of calls that were
made to the function during the profiling run. This value will only
be reported for functions that have been explicitly marked for call counting
with profile-call-counts
.
The profiler also hooks into the disassembler such that instructions which have been sampled are annotated with their relative frequency of sampling. This information is not stored across different sampling runs.
; 6CF: 702E JO L4 ; 6/242 samples ; 6D1: D1E3 SHL EBX, 1 ; 6D3: 702A JO L4 ; 6D5: L2: F6C303 TEST BL, 3 ; 2/242 samples ; 6D8: 756D JNE L8 ; 6DA: 8BC3 MOV EAX, EBX ; 5/242 samples ; 6DC: L3: 83F900 CMP ECX, 0 ; 4/242 samples
This module is known not to work consistently on the Alpha platform, for technical reasons related to the implementation of a machine language idiom for marking sections of code to be treated as atomic by the garbage collector; However, it should work on other platforms, and the deficiency on the Alpha will eventually be rectified.
Allocation profiling is only supported on SBCL builds that use the generational garbage collector. Tracking of call stacks at a depth of more than two levels is only supported on x86 and x86-64.
Repeatedly evaluate
body
with statistical profiling turned on. In multi-threaded operation, only the thread in whichwith-profiling
was evaluated will be profiled by default. If you want to profile multiple threads, invoke the profiler withstart-profiling
.The following keyword args are recognized:
:sample-interval
<n>- Take a sample every <n> seconds. Default is
*sample-interval*
.:alloc-interval
<n>- Take a sample every time <n> allocation regions (approximately 8kB) have been allocated since the last sample. Default is
*alloc-interval*
.:mode
<mode>- If
:cpu
, run the profiler incpu
profiling mode. If:alloc
, run the profiler in allocation profiling mode. If:time
, run the profiler in wallclock profiling mode.:max-samples
<max>- Repeat evaluating body until <max> samples are taken. Default is
*max-samples*
.:max-depth
<max>- Maximum call stack depth that the profiler should consider. Only has an effect on x86 and x86-64.
:report
<type>- If specified, call
report
with:type
<type> at the end.:reset
<bool>- It true, call
reset
at the beginning.:threads
<list-form>- Form that evaluates to the list threads to profile, or
:all
to indicate that all threads should be profiled. Defaults to the current thread. (Note:start-profiling
defaults to all threads.)
:threads
has no effect on call-counting at the moment.On some platforms (eg. Darwin) the signals used by the profiler are not properly delivered to threads in proportion to their
cpu
usage when doing:cpu
profiling. If you see empty call graphs, or are obviously missing several samples from certain threads, you may be falling afoul of this.:loop
<bool>- If true (the default) repeatedly evaluate
body
. If false, evaluate if only once.
Report statistical profiling results. The following keyword args are recognized:
:type
<type>- Specifies the type of report to generate. If
:flat
, show flat report, if:graph
show a call graph and a flat report. If nil, don't print out a report.:stream
<stream>- Specify a stream to print the report on. Default is
*standard-output*
.:max
<max>- Don't show more than <max> entries in the flat report.
:min-percent
<min-percent>- Don't show functions taking less than <min-percent> of the total time in the flat report.
:sort-by
<column>- If
:samples
, sort flat report by number of samples taken. If:cumulative-samples
, sort flat report by cumulative number of samples taken (shows how much time each function spent on stack.) Default is*report-sort-by*
.:sort-order
<order>- If
:descending
, sort flat report in descending order. If:ascending
, sort flat report in ascending order. Default is*report-sort-order*
.:show-progress
<bool>- If true, print progress messages while generating the call graph.
:call-graph
<graph>- Print a report from <graph> instead of the latest profiling results.
Value of this function is a
call-graph
object representing the resulting call-graph, ornil
if there are no samples (eg. right after callingreset
.)Profiling is stopped before the call graph is generated.
Start profiling statistically in the current thread if not already profiling. The following keyword args are recognized:
:sample-interval
<n>- Take a sample every <n> seconds. Default is
*sample-interval*
.:alloc-interval
<n>- Take a sample every time <n> allocation regions (approximately 8kB) have been allocated since the last sample. Default is
*alloc-interval*
.:mode
<mode>- If
:cpu
, run the profiler incpu
profiling mode. If:alloc
, run the profiler in allocation profiling mode. If:time
, run the profiler in wallclock profiling mode.:max-samples
<max>- Maximum number of samples. Default is
*max-samples*
.:max-depth
<max>- Maximum call stack depth that the profiler should consider. Only has an effect on x86 and x86-64.
:threads
<list>- List threads to profile, or
:all
to indicate that all threads should be profiled. Defaults to:all
. (Note:with-profiling
defaults to the current thread.)
:threads
has no effect on call-counting at the moment.On some platforms (eg. Darwin) the signals used by the profiler are not properly delivered to threads in proportion to their
cpu
usage when doing:cpu
profiling. If you see empty call graphs, or are obviously missing several samples from certain threads, you may be falling afoul of this.:sampling
<bool>- If true, the default, start sampling right away. If false,
start-sampling
can be used to turn sampling on.
Mark the functions named by
names
as being subject to call counting during statistical profiling. If a string is used as a name, it will be interpreted as a package name. In this case call counting will be done for all functions with names likex
or (SETF X), wherex
is a symbol with the package as its home package.
Clear all call counting information. Call counting will be done for no functions during statistical profiling.
Default number of traces taken. This variable is somewhat misnamed: each trace may actually consist of an arbitrary number of samples, depending on the depth of the call stack.
sb-sprof
is an SBCL port, with enhancements, of Gerd
Moellmann's statistical profiler for CMUCL.