Debugging Performance Issues
Sirikata is large, complicated, and uses multiple languages. Debugging performance issues can be tricky. The following tries to describe the available tools, how to use them, and which is best for different situations.
If you're looking for hotspots in C++ code, this is the suggested tool. It works by having the kernel periodically trigger sampling, collecting stacktraces from Sirikata (and other applications). You run opcontrol to enable sampling, run the application, and then turn off opcontrol. A separate program, opreport, will report the collected data. See the OProfile documentation for details.
A simple wrapper script for whichever binary you're targeting can be very handy:
#!/bin/bash sudo opcontrol --no-vmlinux --start sudo opcontrol --reset sudo opcontrol --callgraph=6 ./cppoh_d "$@" sudo opcontrol --stop sudo opcontrol --shutdown
When you run opreport, you'll need to filter out non-Sirikata modules (because OProfile examines the entire system). Make sure you include Sirikata's libraries and plugins as well as the binary you're examining!
Google Perftools is already included as we also use it for Debugging Memory Issues. To use it, set the CMake variable GOOGLE_PERFTOOLS_SETTING=CPU and recompile. When you execute the program you want to examine, set the environment variable CPUPROFILE to a location to save profiling data to, e.g.,
> env CPUPROFILE=/tmp/cppoh.prof /home/me/sirikata/build/cmake/cppoh_d
Then run pprof to analyze the output.
You can build Sirikata with GNU Profiler settings. Set CMAKE_BUILD_TYPE=Profile. This is an instrumentation-based statistical profiler. It uses the same basic idea as OProfile and will report information in a similar manner. However, it's more intrusive and seems to have a higher impact on performance. Therefore, you should generally use it when you can't get the information you need from OProfile, or to double check OProfile's results.
See the GNU Profiler guide for more details.
Course Grained Profiling
For a few parts of code where performance issues might be expected, especially if they could result in long-running event handlers, TimeProfiler's are manually added around event handlers. This is much less heavyweight than a profiler like GProf and minimally invasive, so it is useful to ensure you're getting results that aren't skewed by the profiler.
To enable profilers, add --profile=true to the command line (this should work for all binaries). The statistics will be reported at exit:
[PROFILER] INFO: Profiler report: Object Host [PROFILER] INFO: Stage: Session Manager -- Avg: 2us Min: 2us Max:7us Sum: 38us Its: 13 [PROFILER] INFO: Stage: Handle Read Network -- Avg: 10us Min: 5us Max:1.045ms Sum: 125.492ms Its: 12018 [PROFILER] INFO: Stage: Handle Server Message -- Avg: 402us Min: 23us Max:4.346ms Sum: 1.8317s Its: 4550 [PROFILER] INFO: Stage: TimeSync -- Avg: 81us Min: 31us Max:173us Sum: 326us Its: 4 [PROFILER] INFO: Stage: OrphanLocUpdateManager -- Avg: 8us Min: 8us Max:9us Sum: 17us Its: 2
This data covers the entire run, so be wary of variation or spikes.
You can add your own 'Stage' by using TimeProfiler::addStage. All Contexts have a TimeProfiler in Context::profiler. A few classes (e.g. PollingService) will automatically add a Stage if you provide a name for it in their constructor. In fact, PollingService is a good example of how to add your own TimeProfiler Stage.
Fine-grained Event Loop Profiling
- Build with the CMake variable SIRIKATA_TRACK_EVENT_QUEUES enabled
- During performance issues, break and call Sirikata::Network::IOService::reportAllStats()