Slow read performance from HyperV clusterVM for large blocks
Posted: Wed Feb 15, 2012 6:27 pm
I've done alot of optimizing to try to get the very most out of limited resources, and the results were pretty cool, but somehow, something's troubling me a bit.
On each host I've got an intel 82576 NIC team ALB with TLB/RLB which does the cluster-traffic, another intel NIC doing iSCSI only (tcpv4 bound only), another intel NIC iSCSI only for redundancy, then two intel NICs in team VMLB with VMDq bound to hyper-v, no management. One set goes to switch one, the other to switch 2. Both switches linked with LACP-2ports for crossover. Jumbo frames on everything:NICs, hyper-v switch, vNICs, switches. All NICs have the TCPAckFrequency tweak. I'm using an LSI2108 for storage, throughput max. 500MB/sec. From within a VM I'm using ATTO to benchmark, Starwind server set to ramdisk 16 Gb, formatted 4K blocks NTFS, VHD in VM formatted 4K blocks (gives best performance for me), no cache. The results are 500 MB/sec. max. on the server running Starwind server (local), switching the VM to the other host, all traffic needs to go over the iSCSI NICs, resulting in max. 250 MB/sec. (2x iSCSI wirespeed) transferspeed from within the VM. So far so good. The speeds I talked here are both read/write. I even bumped it up some more to get closer to SAS speed adding the Cluster team for iSCSI aswell, weight 50%. Now the odd thing: Using a ramdisk for benchmarking, I thus do get the max. possible from the HW. Then I switch to what you would normally use, an .img, 65536 header, 4,5 Tbyte RAID5, 4K blocks, no cache. Suddenly, my reads go to half my writes. RAID5 is supposed to be good read, avg. write perf. I get the opposite. If I do the .img with cache, 4Gb, write back, 5 sec., spees ofcourse race over 2 Gbyte/sec. so not very realistic but logical. Why do my reads fall apart using an .img compared to using a ramdisk. Moreover, the last 3 benchesizes, With cache, the last 3 benchesizes, 2, 4,8Mb completely "collapse" to about SAS speed. This is probably because the cache can't handle these additional big blocksizes and just passes them through ?
On each host I've got an intel 82576 NIC team ALB with TLB/RLB which does the cluster-traffic, another intel NIC doing iSCSI only (tcpv4 bound only), another intel NIC iSCSI only for redundancy, then two intel NICs in team VMLB with VMDq bound to hyper-v, no management. One set goes to switch one, the other to switch 2. Both switches linked with LACP-2ports for crossover. Jumbo frames on everything:NICs, hyper-v switch, vNICs, switches. All NICs have the TCPAckFrequency tweak. I'm using an LSI2108 for storage, throughput max. 500MB/sec. From within a VM I'm using ATTO to benchmark, Starwind server set to ramdisk 16 Gb, formatted 4K blocks NTFS, VHD in VM formatted 4K blocks (gives best performance for me), no cache. The results are 500 MB/sec. max. on the server running Starwind server (local), switching the VM to the other host, all traffic needs to go over the iSCSI NICs, resulting in max. 250 MB/sec. (2x iSCSI wirespeed) transferspeed from within the VM. So far so good. The speeds I talked here are both read/write. I even bumped it up some more to get closer to SAS speed adding the Cluster team for iSCSI aswell, weight 50%. Now the odd thing: Using a ramdisk for benchmarking, I thus do get the max. possible from the HW. Then I switch to what you would normally use, an .img, 65536 header, 4,5 Tbyte RAID5, 4K blocks, no cache. Suddenly, my reads go to half my writes. RAID5 is supposed to be good read, avg. write perf. I get the opposite. If I do the .img with cache, 4Gb, write back, 5 sec., spees ofcourse race over 2 Gbyte/sec. so not very realistic but logical. Why do my reads fall apart using an .img compared to using a ramdisk. Moreover, the last 3 benchesizes, With cache, the last 3 benchesizes, 2, 4,8Mb completely "collapse" to about SAS speed. This is probably because the cache can't handle these additional big blocksizes and just passes them through ?