Trying to keep posts without real data as short as possible and so a quick update on what are the main factors that can cause retrograde throughput in load tests esp on db layer and worth checking quickly through the aid of various tools as needed for the case.
Main causes:
Sometimes there is no single tool/methodology would work if you want to arrive at a logical conclusion&measuring things/important metrics in analyzing issues such as why throughput falls off or becomes retrograde so badly against expectations in a given setup. Main causes:
- Unix/OS syscall code path
- Recent 64bit OS versions on x86 hardware have improved/faster syscall entry into kernel mode to address this to some extent.
- CPU Cache to Cache communication
- This is where hardware coherency factor shows up. eg: CPU crosscalls and CPU interconnects play important role here.
- Main memory to Cache data movement/transfer
- Cache misses,memory stalls play important role here. cpustat,dtrace(cpc probes/PAPI metrics), linux perf hardware events to analyze can aid here. Measuring CPI(cycles per instruction) is a useful metric along with your critical code path length.
- Waiting on I/O DMA request
- Waiting for I/O DMA request to complete
- Spinning on a latch/mutex
- Adaptive/ticketed spinlocks for getting a mutex play important role here.