iSCSI Hyper-V performance

matty-ct · Fri Jul 20, 2012 8:40 pm

Hi all,

Any guidance or opinions are greatly appreciated. I am currently planning out a virtualization migration plan for a customer. I'm using top of the line everything in the hardware but I'm a bit concerned about disk throughput with the iSCSI SAN. How should I benchmark (ahead of purchase!) if the solution can accommodate the required performance? For example, I'm building (proposed) an iSCSI SAN on HP DL370's. Twelve 600GB 6Gbps 15K SAS drives in a RAID 5 array. ***edit: RAID 10 if recommended*** So that's 6TB or so on RAID5 with the hot spare, much less on RAID 10. iSCSI traffic will go over quad bonded Gigabit nics, or 10GB nics if highly recommended instead. The two or three cluster nodes are DL380 G8's also with 15K SAS drives, probably a small RAID 10 array each for the OS. A total of 20 to 40 VM's will be hosted in the failover cluster, eventually.

Some VM's will host SQL 2008 server, some will be term servers, some basic Windows appliance servers. They might add an Oracle server VM or two down the road. I know that the new E5-2650 CPUs are sufficient for the VM load. I worry about the iSCSI throughput. I can't have these guys buy $70K of hardware and services and find that the VM load is too great for the disk array.

Would it be better for performance purposes to break up the iSCSI array into smaller arrays using a separate controller or controller channel? If so I lose a drive's worth of storage for each RAID 5 array I create.

Also, there will be a Starwind HA failover node. Can we use less expensive SATA drives for the replication partner or will that substantially slow down the replication process? Or worse, slow down the primary iSCSI LUNs?

Thanks for any thoughts!

Matt

Mon Jul 23, 2012 10:32 am

Hi! Lets go through this one-by-one.

I'm using top of the line everything in the hardware but I'm a bit concerned about disk throughput with the iSCSI SAN. How should I benchmark (ahead of purchase!) if the solution can accommodate the required performance? For example, I'm building (proposed) an iSCSI SAN on HP DL370's. Twelve 600GB 6Gbps 15K SAS drives in a RAID 5 array. ***edit: RAID 10 if recommended*** So that's 6TB or so on RAID5 with the hot spare, much less on RAID 10. iSCSI traffic will go over quad bonded Gigabit nics, or 10GB nics if highly recommended instead. The two or three cluster nodes are DL380 G8's also with 15K SAS drives, probably a small RAID 10 array each for the OS. A total of 20 to 40 VM's will be hosted in the failover cluster, eventually.

First of all I'd strongly recommend to take a look at our Benchmarking Guide.

Also, I'd like you to know that recommended RAID for implementing an HA are RAID 1, 0 or 10, RAID 5 or 6 are not recommended due to low write performance.
The performance of a RAID array directly depends on the Stripe Size used. There are no exact recommendations of which stripe size to use. It is a test-based choice. As best practice we recommend at first step to set recommended by vendor and run tests. Then set a bigger value and run tests again. In third step set a smaller value and test again. These 3 results should guide you to the optimal stripe size value to set. In some configuration smaller stripe size value like 4k or 8k give better performance and in some other cases 64k, 128k or even 256k values will give better performance.
Performance of the HA will depend on the performance of the RAID array used. It’s up to the customer to determine the optimal stripe size.

If you`ll use 10 RAID I can guarantee that you will need 10 Gig NICs. It will also be the good choice because your system will hopefully grow, so as the hardware requirements, so there will be oone day when you will require 10Gbps instead of 4x1Gbps cards.

Some VM's will host SQL 2008 server, some will be term servers, some basic Windows appliance servers. They might add an Oracle server VM or two down the road. I know that the new E5-2650 CPUs are sufficient for the VM load. I worry about the iSCSI throughput. I can't have these guys buy $70K of hardware and services and find that the VM load is too great for the disk array.

This CPU looks OK for me. But CPU is not only thing that you should take a very close look at - the motherboard is also very important. On my exerience I saw when MB was the reason of 10 Gbps throughput for reads BUT 6-7 Gbps for writes - it was just design of hardware., but we wasted a lot of time to find out that this is MB, not HDD, not CPU, not anything else.

Would it be better for performance purposes to break up the iSCSI array into smaller arrays using a separate controller or controller channel? If so I lose a drive's worth of storage for each RAID 5 array I create.

I think few smaller volumes will be the good choice - the requests to disk will not load the entire volume and as you have mentioned - the performance will not be significantly decreased in case of RAID rebuild.

Also, there will be a Starwind HA failover node. Can we use less expensive SATA drives for the replication partner or will that substantially slow down the replication process? Or worse, slow down the primary iSCSI LUNs?

If you will use write-through cache having one slow server in HA configuration will not slow down the system until the moment when the cache will be overfilled. Also, as you may know any cluster requires homogenity, and StarWind HA is the storage cluster, so we do recommend to have +/- identical boxes.

dtrounce · Thu Sep 06, 2012 4:34 pm

Related question. I want to configure two hosts in two separate data centers for HA/DR. The data centers are linked by a private line (PL) GbE metro fibre line, they are on the same subnet on a VLAN that spans the two DCs. Each host will be 2-socket, 6-8 cores/socket, 144-384GB RAM, 24x 300GB 10k SAS on a single controller, running Windows Server 2012.

Each host will run Starwind HA targets, with write-back caching. I'm thinking I will configure the disks as a 3.3TB RAID10 over 22 drives, which will do 500-1,500MB/sec, depending on the read/write sequential/random patterns.

Each host will be a Hyper-V server using Windows Failover Clustering, with CSV caching. I want the VMs to primarily access the local partner, using the cross-DC link primarily for sync. I will run heartbeat over a separate subnet routed via a VPN over the WAN connection. I can use iSCSI MPIO to prefer the local target. I don't think I can do this with SOFS.

Will I be limited to 125MB/sec using an iSCSI connection to the Starwind target on the same host if I only have GbE network adapters, even if all the traffic stays within the same host? Or do I need an Infiniband RDMA/54Gb adapter (I'd like to skip 10Gb). That would also allow me to scale out to more hosts in each DC through IB/RDMA.

Edit: I've discovered I can get close to 400MB/sec using loopback on a GbE adapter on a dev server. This looks like it could higher (850MB/sec) on a better server. So I'm not limited to wire speed for loopback. See also here: http://social.technet.microsoft.com/For ... 191bc777ac

The main constraint I can't work around seems to be the GbE private line. I want this setup for both HA and DR. During maintenance, I will live migrate VMs to the other host, transferring just RAM across the PL. Mostly the Starwind host and the Hyper-V host will be down at the same time. In other cases, all iSCSI client traffic will have to go over the PL, which will be for failover situations only, e.g. Starwind host maintenance. Obviously I need to be able to use fast syncs and avoid full syncs wherever possible.

I was thinking of using de-duplicated targets, but if this will give a large performance degredation, I might be better off using fixed targets. This seems particularly true when doing a full resync; the resync happens at ~4MB/sec disk activity (vs. 80MB/sec for non de-duplicated targets), and the responsiveness to clients during the resync is awful - at least, with the default priority. Changing the priority to total client responsiveness improves things a great deal.

Comments? Suggestions on caching performance tuning, hardware, network adapters etc.? Are there benchmarks for what Starwind's limits, given sufficient underlying hardware?

Mon Sep 10, 2012 8:34 am

Hi David,
The synchronization channel will be the bottleneck of this configuration, not only limiting the throughput to ~115-120MB/s but also increasing the latency due to link and partner SAN response time.
2 solutions will soon be available to solve the MAN Cluster problem
1. Deduplication with the offsite replication functionality
2. 3 Node Active<->Active->Passive HA scenario with the 3rd node kept offsite.
Stay tuned, we will post more information on our website really soon.