--skin=calltree
on the Valgrind command line or use the supplied script calltree
.
Calltree is an extension of the Cachegrind skin. Read the documentation of
the Cachegrind skin first; this page only describes the features supported in addition
to Cachegrinds features.
Detailed technical documentation on how Calltree works is available here. If you want to know how to use it, you only need to read this page.
This will collect information
If you are only interested the first item, it's enough to use the Cachegrind skin from Valgrind. If you are only interested in the second item, use Calltree with option "--simulate-cache=no". This will give instruction read accesses only. But it significantly speeds up the profiling typically by a factor of 2 or 3.
The generated dump files are named
There are different ways to generate multiple profile dumps while a program is running under supervision of Calltree. Still, all methods trigger the same action "dump all profile information since last dump or program start, and zero cost counters afterwards". To allow for zeroing cost counters without dumping, there exists a second action "zero all cost counters now". The different methods are:
Thus, execute "touch cachegrind.cmd" to force creation of a new dump file. When the dump is finished, the command file is removed. Note that the program must be running for detection of the file. So, for a GUI application, resize the window or for a server send a request.
If you are using KCachegrind for browsing of profile information, you can use the toolbar button "Force dump". This will create the file "cachegrind.cmd" and will trigger a reload after the dump is written.
You can specify these options multiple times for different function prefixes.
In Valgrind terminology, this way is called "Client requests". The given macros generate a special instruction pattern with no effect at all (i.e. a NOP). Only when run under Valgrind (or Calltree), the CPU simulation engine detects the special instruction pattern and triggers special actions like the ones described above.
If a call chain goes multiple times around inside of a cycle, you can't distinguish costs coming from the first round or the second. Thus, it makes no sense to attach any cost to a call among function in one cycle: if "A > B" appears multiple times in a call chain, you have no way to partition the one big sum of all appearances of "A > B". Thus, for profile data presentation, all functions of a cycle are seen as one big virtual function.
Unfortunately, if you have an application using some callback mechanism (like any GUI program), or even with normal polymorphism (as in OO languages like C++), it's quite possible to get large cycles. As it is often impossible to say anything about performance behaviour inside of cycles, it is useful to introduce some mechanisms to avoid cycles in call graphs at all. This is done by treating the same function as different functions depending on the current execution context by giving them different names, or by ignoring calls to functions at all.
There is an option to ignore calls to a function with "--fn-skip=funcprefix". E.g., you usually don't want to see the trampoline functions in the PLT sections for calls to functions in shared libraries. You can see the difference if you profile with "--skip-plt=no". If a call is ignored, cost events happening will be attached to the enclosing function.
If you have a recursive function, you can distinguish the first 10 recursion levels by specifying "--fn-recursion10=funcprefix". Or for all functions with "fn-recursion=10", but this will give you much bigger profile dumps. In the profile data, you will see the recursion levels of "func" as the different functions with names "func", "func'2", "func'3" and so on.
If you have call chains "A > B > C" and "A > C > B" in your program, you usually get a "false" cycle "B <> C". Use "--fn-caller2=B --fn-caller2=C", and functions "B" and "C" will be treated as different functions depending on the direct caller. Using the apostrophe for appending this "context" to the function name, you get "A > B'A > C'B" and "A > C'A > B'C", and there will be no cycle. Use "--fn-callers=3" to get a 2-caller depencendy for all functions. Again, this will multiplicate the profile data size.
--base=<prefix>
This option is especially usefull if your application changes its working directory. Usually, the dump file is generated in the current working directory of the application at program termination. By giving an absolute path with the base specification, you can force a fixed directory for the dump files.
--separate-dumps=yes|no
--simulate-cache=yes|no
Note however, that estimating of how much real time your program will need only by using the instruction read counts is impossible. Use it if you want to find out how many times different functions are called and there call relation.
--collect-state=yes|no
To only look at parts of your program, you have two possibilities:
Collection state can be toggled at entering and leaving of a given function with option
--toggle-collect=<function>
. For this, collection state should be
switched off at the beginning. Note that the specification of --toggle-collect
implicitly sets --collect-state=no
.
Collection state can be toggled also by using a Valgrind User Request in your application.
For this, include valgrind/calltree.h
and specify the macro
CALLTREE_TOGGLE_COLLECT
at the needed positions. This only will have any effect
if run under supervision of the Calltree skin.
--skip-plt=no|yes
--fn-skip=<function>/code>
Ignore calls to/from a given function? E.g. if you have a call chain A > B > C, and
you specify function B to be ignored, you will only see A > C.
This is very convenient to skip functions handling callback behaviour. E.g. for the SIGNAL/SLOT
mechanism in QT, you only want to see the function emitting a signal to call the slots connected
to that signal. First, determine the real call chain to see the functions needed to be skipped,
then use this option.
--fn-group<number>=<function>
Put a function into separation group number.
--fn-recursion<number>=<function>
Separate <number> recursions for <function>
--fn-caller<number>=<function>
Separate <number> callers for <function>
--dump-before=<function>
Dump when entering <function>
--zero-before=<function>
Zero all costs when entering <function>
--dump-after=<function>
Dump when leaving <function>
--toggle-collect=<function>
Toggle collection on enter/leave <function>
--fn-recursion=<level>
Separate function recursions, maximal <level> [2]
--fn-caller=<callers>
Separate functions by callers [0]
--mangle-names=no|yes
Mangle separation into names? [yes]
--dump-threads=no|yes
Dump traces per thread? [no]
--compress-strings=no|yes
Compress strings in profile dump? [yes]
--dump-bbs=no|yes
Dump basic block info? [no]. This needs an update of the KCachegrind importer!
--dumps=<count>
Dump trace each <count> basic blocks [0=never]
4. Profile data file format