Larry Meadows from Intel Corporation has developed two simple tools for the Intel® Xeon® processor line as well as the Intel® Xeon Phi™ coprocessor that allow a user to determine how well their application is using the machine.
Speedometer:
Speedometer measures the resource usage of a system while running an application and reports that usage as a percentage of the peak value of the corresponding resource. The resources that are tracked include memory bandwidth, instruction bandwidth, and vector or floating-point unit use. Average values for each resource are reported after the program executes. It is also possible to record the resource usage over time, and GUI tools are provided to plot such recordings. Speedometer is intended to give you a general idea of how well your code is using the system.
Overhead:
Overhead uses statistical profiling to determine how the application's CPU time is allocated. The hardware periodically interrupts the application and saves the current instruction pointer for each thread. The instruction pointer may be in one of four places:
1. The OpenMP runtime
2. The MPI runtime
3. The kernel (vmlinux)
4. Elsewhere (assumed to be the application)
Overhead keeps track of the time spent in each of these four subsystems and reports the average value, both as a number of threads and as a percentage of CPU time, after the application exits. It is also possible to record the various times as the application progresses and to plot them after the application completes.
Both of these tools are open source and can be found at https://01.org/simple-performance-tools.