Your storage environment is one of the most critical resources in your vSphere environment… no doubt about it. Getting the storage setup properly to ensure proper performance is key to long term viability of the environment. Otherwise, it is entirely plausible that a misbehaving virtual machine can severely degrade the performance of every other virtual machine it is sharing storage with.
What, then, do you do when the performance of your virtual machines takes a dive? Where do you look?
VMware provides some really useful tools for looking into the storage environment. Most are familiar with using the vSphere Client "Performance Tab". Many are familiar with using ESXTOP (or rESXTOP for the remote tools users). However, one utility that sneaks under the radar is: vscsistats
This nifty little utility is able to start analyzing your storage usage at the ESX VM disk level. So, NFS, FC, iSCSI, and local datastores will appear with the same statistics regardless of the protocol/connection to the VMDKs.
There is a caveat, though… This requires ESX (3.5 – 4.1) or ESXi (4.1) as you need to log into a remote session on the server. People have found ways to get vscsistats running on ESXi 4.0, but that is something you can research on your own.
1) SSH into the ESX host and switch to the ‘root’ user
2) cd /usr/lib/vmware/bin
3) List the VMs running on your ESX host: ./vscsistats -l
– Make note of the "worldGroupID" of the VM you are interested in
4) Start gathering statistics for that VM: ./vscsistats -w your_world_group_id -s
5) Wait for a period of time for the statistics to be gathered. The longer you wait, the more telling the statistics will be. vscsistats stops itself around 30 minutes.
6) Check out the statistics. In this example, we will look at the latency for read operations, write operations, and overall operations: ./vscsistats -w your_world_group_id –p latency
7) Other operations exist (all, ioLength, seekDistance, outstandingIOs, interarrival). Play around with them at your leisure.
8) When you have finished, stop the statistics gathering: ./vscsistats -x -w
Check out the example below:
In the histogram above, we see that the majority of the operations (both read and write combined) are around 1ms (or 1000us). However, we have some stragglers around 15-30ms. All in all, it is not too bad. I would hate to see more in the 100+ms range. That would end up being a very telling sign that you have some I/O problems to address.