Page 1 of 1
Advice/Comments on proposed host/storage deployment
Posted: Tue Apr 02, 2013 1:14 am
by Caillin
We are currently due for a refresh of our SAN/Hyper-V host infrastructure, and I am looking for comments and advice regarding a potential design.
We currently run 5 x DL380 G5 servers as hyper-v cluster hosts. These servers are connected to a 2Gb FC storage fabric to an IBM DS4300 with 15k FC disks and an EXP810 with 2TB Sata disks.
We have around 30 virtual servers running a fairly typical SME workload.
Rather than go with the typical option of replacing the SAN with a newer faster version of the IBM, or Equalogic, I've been doing some research and thinking into using Starwind Native SAN for Hyper-V and local storage instead.
My current idea of specification for this setup would be:
3 x Dell 720xd Servers
2 x Xeon E5-2680 CPU (total 16 cores per physical box)
128GB Memory
16 x 1TB near line-SAS 7200RPM drives in Raid 0
2 x 500GB near line-SAS drives in Raid 1
1 x 400GB Intel DC S3700 Enterprise SSD for read-only Cachecade
Thoughts behind this configuration:
16 x near line SAS drives in Raid 0 really makes me cringe, but gives good throughput and capacity.
Memory configuration gives us the ability to run entire 30 guest VM environment on a single node if necessary, giving n+2 redundancy.
3 Node HA configuration with Starwind would require us to have 3 separate drive failures on all three nodes within the space of the sync process not completing to have a total failure.
As I understand it, Starwind can use physical memory for read/write cache to increase performance to local disk arrays, giving very fast response and IO for in-cache reads.
Cachecade addition on reads should mean that any data not in Starwinds cache will be read from the 400GB SSD, and still provide very high IOPS and low read latency. Documentation on Cachecade shows that it is very competent at retrieving and keeping hot files in cache.
Physical size footprint would reduce from our current 16RU to 6RU.
As this seems to be a fairly uncommon Server/SAN configuration, I’d really like to get some advice and feedback on any major flaws or aspects that I have failed to think about. From where I’m sitting, it looks like we could get a significant increase in speed, reliability (redundancy) and footprint, all for less than half the cost of an equivalent SAN provider product. Has anybody else used a similar configuration in production? Be interested to hear.
Re: Advice/Comments on proposed host/storage deployment
Posted: Tue Apr 02, 2013 10:50 am
by Max (staff)
Hi Caillin,
I really liked the first part of the story, general idea is just what you need to build a failover cluster using StarWind Native SAN.
Have you already decided which networking components you would like to use?
With a scenario like this, network becomes very important factor.
Re: Advice/Comments on proposed host/storage deployment
Posted: Wed Apr 03, 2013 3:19 am
by Caillin
Thanks for the reply Max, I would be looking at using 2 x 4Port 10Gbe NICs in each node. This should give enough scope to cover off VMnetwork/CSV/Live Migration/Cluster Heartbeat/Starwind Sync.
One of my concerns though, is running Raid 0 on the storage volume. If one of the drives fail, then the entire array will need to be rebuilt and then re-synced with the other two nodes. How automated is this process within the Starwind software? Obviously the sync time would depend on the volume of data syncing as well as the bandwidth and NIC configuration, but how much of a pain would it be to replace the drive, rebuild the array, and kick off the re-sync from within Starwind?
Bearing this in mind, what are your thoughts about a 3-node configuration with Raid-0 vs a 2-node configuration with Raid-10?
Re: Advice/Comments on proposed host/storage deployment
Posted: Wed Apr 03, 2013 12:31 pm
by Max (staff)
4 Port 10 GbE cards? Curious to know who is the manufacturer:)
Please note that you need to be running at least 2 synchronisation channels for iSCSI if all 3 nodes run VMs simultaneously,
This way you'll be able to get the maximum iSCSI performance = 2/3 of the 10 gbe link speed.
As for the RAID 0 as your storage volume - you can split it to smaller volumes. This has 3 benefits:
1. You get smaller recovery time.
2. Failure of a drive does not affect the whole storage.
3. You can individually provision cache on both RAID controller and HA device level. This will allow you to sort VMs by typical load type to optimize storage use.
Re: Advice/Comments on proposed host/storage deployment
Posted: Thu Apr 04, 2013 1:54 am
by Caillin
Ah your right, I was getting mixed up with the QP 1Gbe cards. The 720xd has quite a number of PCIe slots so the required combination for Sync channels as well as Hyper-V should be doable.
Could you give a bit of insight into the process of resyncing a node in the case of an array failure Max? Is this a fairly straight forward management procedure once the drive is replaced and the new array brought back online?
Re: Advice/Comments on proposed host/storage deployment
Posted: Mon Apr 08, 2013 8:35 am
by Max (staff)
After the new array is brought online the remaining nodes will synchronize the data to the new volume. In the meantime VMs on this node are still operational since they have access to the HA storage pool. The procedure requires about 3-5 minutes (remove the information about the failed node - re-point SW to the rebuilt array) after that StarWind does everything automatically.
Re: Advice/Comments on proposed host/storage deployment
Posted: Mon Apr 08, 2013 12:34 pm
by jeddyatcc
The only thing that I will add is that you will be in a degraded state as any LUNs on the failed storage will only have 2 paths. Also, after the array rebuilds, all of the data on the failed LUN will be copied from one of the good Starwind servers which can affect performance as well(not much I admit, but 5-10%, along with the 3rd path being down). Since you are using 1TB drives, you should seriously look into RAID5, I use a RAID60 (3TB drives and RAID5 are not really a good idea) and have had very good performance and redundancy.
I use basically the same setup, but went with 3TB hard drives. I would skip the cachecade setup and get the 1GB NV Cache RAID adapter from DELL(my opinion). If you want SSD, I would recommend getting a PCIe SSD and using something like Velobit, until Starwind supports SSD offloading. I have had awesome performance doing this, just remember that more RAM = More caching so 128GB sounds like a lot, but in this case more=better.
Re: Advice/Comments on proposed host/storage deployment
Posted: Tue Apr 09, 2013 6:10 am
by Caillin
jeddyatcc wrote:The only thing that I will add is that you will be in a degraded state as any LUNs on the failed storage will only have 2 paths. Also, after the array rebuilds, all of the data on the failed LUN will be copied from one of the good Starwind servers which can affect performance as well(not much I admit, but 5-10%, along with the 3rd path being down). Since you are using 1TB drives, you should seriously look into RAID5, I use a RAID60 (3TB drives and RAID5 are not really a good idea) and have had very good performance and redundancy.
I use basically the same setup, but went with 3TB hard drives. I would skip the cachecade setup and get the 1GB NV Cache RAID adapter from DELL(my opinion). If you want SSD, I would recommend getting a PCIe SSD and using something like Velobit, until Starwind supports SSD offloading. I have had awesome performance doing this, just remember that more RAM = More caching so 128GB sounds like a lot, but in this case more=better.
Thanks for the info on the paths, that's pretty much what I expected based on the Starwind literature.
I don't want to use any parity based raid array, as that will kill write performance, and with a mixed guest VM load, this severely affects performance.
I'm not sure what you are suggesting with the 1GB NV cache RAID adaptor. That IS the PERC 710 adaptor to which I was referring, but it also supports the Cachecade read cache. So in decreasing order of speed you would have:
Starwind RAM cache - Around 100Gb planned for my deployment
Cachecade SSD read cache - 400Gb non-volatile for hot use data
Baremetal disk - Cold data to come off 16x1Tb spindles at Raid0 speeds.
After the additional info on array re-sync process, I'm more confident now that Raid0 is easily workable. It essentially a Raid0(1) anyway. Enterprise drive MTBF is more than long enough to not have to worry about losing three separate nodes within hours of each other.
It will be interesting to see how well the Starwind SSD caching works out, but from the whitepapers and performance benchmarks I've seen from the Cachecade algorithms, it is a pretty amazing additional to any mechanical disk array.
Re: Advice/Comments on proposed host/storage deployment
Posted: Tue Apr 23, 2013 3:29 pm
by Max (staff)
Callin,
I totally agree with you on the RAID side.
As for the Array solid state cache - we're developing our own one.
So even if the built in doesn't work well there is still an option to go with SSD cache and SAN software from one vendor.
Re: Advice/Comments on proposed host/storage deployment
Posted: Wed May 01, 2013 8:59 pm
by epalombizio
Callin,
I was faced with a very similar decision recently and decided to go with Starwind on a 2 node HA setup in each of two offices.
I'm currently using CACHECADE for R/W caching, but am considering switching to Read-Only so I can move to RAID0 instead of RAID1 and double my available capacity. Most of my latency is on the read side anyways so it would seem like the way to go, but I figured I'd give Cachecade a few days to learn the data patterns before making that call.
HA performance is flawless. We have a 9271-8i on each storage server. I had an issue where removing a drive from an optimal virtual drive would cause the card to issue a reset and Starwind would lose Sync for a moment.. servers didn't skip a beat and the fast sync back from the other HA server was completed in less than a minute over dual 10GB links.
Starwind,
What happens if we are using WB cache and one of the servers goes down hard (bluescreen, motherboard failure,etc).. Since the WB is a R/W cache, do we risk causing harm to the underlying volume?
Thanks,
Elvis
Re: Advice/Comments on proposed host/storage deployment
Posted: Tue May 14, 2013 4:53 pm
by Max (staff)
Elvis,
You're not risking your data - every write is only confirmed if it's cached on both nodes.
So if one node goes down hard, the last write command is either re-routed to the remaining SAN or already written to the remaining SAN.
As soon as one SAN figures out that the partner is dead, the cache is immediately flushed to the disk. Then the node switches to Write Through to guarantee data consistency.