L McVoy, C Staelin, lmbench: Profitable tools for performance analysis, USENIX Annual Technical Conference, 1996
- lmbench provides a suite of benchmarks that attempt to measure common performance bottlenecks in a wide range of system applications.
- These bottlenecks have been identified, isolated, and reproduced in a set of small micro-benchmarks, which measure system latency and bandwidth of data movement among the processor and cache, memory, network, file system, and disk.
- Does not take advantage of any multi-processor features.
- Memory bandwidth benchmark: libc bcopy, hand-unrolled loop that loads and stores aligned 8-byte words.
- IPC bandwidth benchmark: Unix pipes (typically implemented as bcopy from/to kernel). TCP sockets. Loopback TCP bandwidth could be as fast as pipe bandwidth.
- Cached I/O bandwidth benchmark: Performance of reusing data in the file system page cache thru two interfaces, read and mmap.
- Memory read latency benchmark: Back-to-back load latency is used.
- Operating system entry latency benchmark: System call performance.
- Signal handling latency benchmark: No context switches since the signal goes to the same process that generated the signal.
- Process creation latency benchmark: fork/wait, fork/execve/wait. popen, system, execlp.
- Context switching latency benchmark: Implemented as a ring of processes that are connected with Unix pipes. A token is passed from process to process, forcing context switches. To simulate active processes introduce an artificial variable size "cache footprint". Deliberate measurement of the effectiveness of caches across process context switches.
- IPC latency benchmark: Pipe, TCP, UDP, RPC, loopback/network, TCP connection.
- File system latency benchmark: Time required to create or delete a zero length file.
- Disk latency benchmark:
- Future work: MP benchmarks, auto sizing based on detecting size of external cache.
- Conclusion: Good cache/memory subsystems are at least as important as the processor speed.