I have spent the last couple of days reading the forums on tuning Starwinds for VMWare and while I have picked up a lot, I have also noticed that a lot is out of date, so I'm hoping someone can help me get the most performance possible with the most current information possible. We are coming off of an OpenFiler system where the performance was always lacking, but I just always assumed that was as good as it could get. Now that I have seen the power of Starwind, I am somewhat obsessed with squeezing every last IOP that I can out of this system.
The eventual goal is a 2 node HA setup. We have already built the two boxes and purchased the licensing, but right now I am just trying to squeeze the max performance out of a standalone target before throwing in the HA variable.
Testing setup:
Dell server with 8 x 7200 RPM 750GB drives in RAID 50.
Windows Server 2008 R2 SP1 x64
4 core Xeon with 14GB RAM
2 NICS for management, and 4 port Intel Pro 1000 ET for storage, bonded with dynamic aggregation mode
Cisco 2960 switch stack with LACP etherchannel(mode active) configured for both SAN nodes.
SAN is on a separate VLAN from management traffic
Jumbo frames enabled from end to end
3 x ESXi 5 hosts with 2 management NICS and a quad port Intel Pro 1000 in each.
In the ESX hosts I have created a vSwitch with 4 vmkernel ports, each with exactly 1 nic mapped. I have then mapped each vmkernel to the iSCSI HBA as discussed in all of the multipathing guides. I have changed my path management to round robin and set iops=1
I am using DiskTT as the benchmarking tool because it is quick and easy and doesn't require a PHD to run

When I run DiskTT on the drive that I am presenting the virtual drive from I am seeing speeds in excess of 200MB/s. When I create an image file and mount the VMFS datastore and storage VMotion a VM into that datastore, the best I've been able to squeeze out of the VM is about 55MB/s. This doesn't even saturate one gigabit link, much less the 4 bonded links, so I think there is room for improvement.
I have not made any of the windows TCP tweaks yet because one of the articles I read said that with 2008 and newer, the only thing that needs set is Jumbo Frames. Can anyone help me with some tweaks to bring the VM performance closer to the performance on the storage server itself.
Thanks!