Wednesday, November 21, 2012

Don't Use Timing Functions for Profiling

One common technique for profiling programs is to use the gettimeofday system call (with code that looks something like this):

Example (incorrect) code that uses gettimeofday - click to view
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
void function(void)
{
  struct timeval before;
  struct timeval after;
  gettimeofday(&before, NULL);
  codetoprofile();
  gettimeofday(&after, NULL); 
  time_t delta = after.tv_sec - before.tv_sec;
  printf("%ld\n",delta);
}

However, using gettimeofday(2) or time(3) or any function designed to get a time of day to obtain profiling information is wrong for many reasons:

  1. Time can go backwards. In a virtualized environment this can happen quite often. In non-virtualized environments this can happen due to time zones. Even passing CLOCK_MONOTONIC to clock(3) doesn't help as it can go backwards during a leap second expansion.
  2. Time can change drastically for no reason. Systems with NTP enabled periodically sync their time with a time source. This can cause the system time to change by minutes, hours, or even days!
  3. These functions measure Wall Clock time. Time spent on entirely unrelated processes is going to be included in the profiling data!
  4. Even if you have disabled everything else on the system[1] the delta computed above includes both of User time and System Time. If your algorithm is very fast but the kernel has a slow implementation of some system call you won't learn much.
  5. gettimeofday relies on the cpu clock which may differ across cores resulting in time skew.

So what should be used instead?

There isn't a good, portable, function to obtain profiling information. However there are options for those not tied to a particular system (or those willing to maintain multiple implementations for different systems.

The getrusage(2) system call is one option for profiling data. This provides different fields for user time (ru_utime) and system time (ru_stime) at a relatively high level of precision and accuracy.

Using DTraces profiling provider also seems to be a decent choice although I limited experience with it.

Finally, using APIs meant to access hardware specific features such as FreeBSD's hwpmc is likely to provide the best results at the cost of being the least portable. Linux has similar features such as oprofile and perf. Using dedicated profilers such as Intel's vtunes[2] may also be worthwhile.

  1. Including networking, background process swapping, cron, etc.
  2. A FreeBSD version is available.
update 2012-11-26: Include note about clock skew across cores.
Update 2013-02-13: Update and fix a massive error I had w.r.t. clock(3)