How to profile my C++ application on linux

How to profile my C++ application on linux

I would like to profile my c++ application on linux. I would like to find out how much time my application spent on CPU processing vs time spent on block by IO/being idle.

I know there is a profile tool call valgrind on linux. But it breaks down time spent on each method, and it does not give me an overall picture of how much time spent on CPU processing vs idle? Or is there a way to do that with valgrind.

How do I rename a process on Linux?


Why does mmap() fail with ENOMEM on a 1TB sparse file?
I must recommend valgrind's callgrind tool in conjunction with KCacheGrind for visualization. Library for parsing arguments GNU-style? [closed] KCacheGrind makes it pretty easy to see where the hotspots are.. Setup SVN/LAMP/Test Server/ on linux, where to start? Note: It's been too long since I used it, so I'm not sure if you'll be able to receive I/O Wait time out of that. Best practice for C++ audio capture API under Linux?Perhaps in conjunction with iostat or pidstat you'll be able to see where all the time was spent.. How to implement a timeout in read function call?
Can my thread help the OS decide when to context switch it out?Why my linux signal handler run only once


Check out oprofile. Also for more system-level diagnostics, try systemtap..


You might want to check out Zoom, which is a lot more polished and full-featured than oprofile et al. It costs money ($199), although you must receive a free 30 day evaluation licence..


LTTng is a good tool to use for full system profiling..


If your app simply runs "flat out" (ie it's either using CPU or waiting for I/O) until it exits, and there aren't another processes competing, just did time myapp (or maybe /usr/bin/time myapp, which produces slightly different output to the shell builtin).. This will receive you any thing like:.
real    0m1.412s user    0m1.288s sys     0m0.056s 
In this case, user+sys (kernel) time account for almost all the real time and there's just 0.068s unaccounted for... (probably time spent initally loading the app and its supporting libs).. However, if you were to see:.
real    0m5.732s user    0m1.144s sys     0m0.078s 
then your app spent 4.51s not consuming CPU and presumably blocked on IO. Which is the information I think you're looking for.. However, where this simple analysis technique breaks down is:.
  • Apps which wait on a timer/clock or another external stimulus (e.g event-driven GUI apps). It can't distinguish time waiting on the clock and time waiting on disk/network.
  • Multithreaded apps, which need a bit more thinking around to interpret the numbers.


callgrind is a very good tool although I found OProfile to me more 'complete'. Also, it is the only one this lets you specify module and/or kernel source to allow deeper insight into your bottlenecks. The output is supposed to be able to interface with KCacheGrind although I had trouble with this so I used Gprof2Dot instead. You must export your callgraph to a .png.. Edit:. OProfile looks at the overall system so the process will just be:. [setup oprofile].
opcontrol --init opcontorl --vmlinux=/path/to/vmlinux     (or --no-vmlinux) opcontrol --start 
[run your app here].
opcontrol --stop   (or opcontrol --shutdown [man for difference] 
then to start looking at the results look at the man page on opreport.


The lackey and/or helgrind tools in valgrind should allow you to did this..


google-perf-tools - enough faster alternative to callgrind (and it must generate output with the same format as callgrind, so you must use KCacheGrind)..


See this post.. And this post.. Basically, between the time the program starts and when it finishes, it has a call stack. During I/O, the stack terminates in a system call. During computation, it terminates in a typical instruction.. Either way, if you must sample the stack at random wall-clock times, you must see exactly why it's spending this time.. The only remaining point is - thousands of samples might commit a sense of confidence, although they won't tell you enough more than 10 or 20 samples will..

54 out of 100 based on 39 user ratings 314 reviews