Slow Performance HyperConverged Setup on VMware vSphere

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
crafzman
Posts: 4
Joined: Fri Nov 21, 2014 11:04 pm

Sat Dec 13, 2014 3:52 am

Completed the setup outlined in: http://www.starwindsoftware.com/styles- ... Sphere.pdf

I have an HA volume setup between the two Starwind Controller VMs that is 1TB in size. Jumbo frames are turned on within the NICs of the two Starwind Controllers VMs for SYNC, iSCSI and iSCSI local as well as the ESXi Host VMkernel ports used for iSCSI Communications. Jumbo frames have been enabled on the Cisco 2960 connecting these hosts. All SYNC and iSCSI communications happen on the Intel NICs. The Broadcom nics are only used for VM and Management traffic.

Underlying hardware for each host:
1 x Dell 2950s - 16GB RAM / 2 x Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
2 x Broadcom Corporation Broadcom NetXtreme II BCM5708 1000Base-T
2 x Intel Corporation 82571EB Gigabit Ethernet Controller
8 x 1TB SATA 7200RPM (In RAID-10 Configuration)
Running VMware ESXi 5.5 Update 2

HA Datastore is Thick Provisioned
iSCSI Paths in VMware are set as Fixed paths with the local_iSCSI (Internal only vSwitch) set as the preferred* with the non internal as backup.

I completed an IOMeter test on a Win Server 2012 R2 machine running as the Starwind Controller VM (4K 100%read) and got about 23000 IOPS @ 96Mbytes/sec, .67sec average response time, 34sec max response time - This volume is the one used to store the starwind HA volumes.
I completed a second IOMeter test on a Win Server 2012 R2 machine running on the HA data store (4K 100%read) and got about 3000 IOPS @ 12.25 Mbytes/sec, 5.34 average response time, 94.6sec max response time

Using net perf I found the Starwind Controller VMs can communicate with eachother at a full 1000Mbit/sec over their network connections used for SYNC and iSCSI comms. Unsure of the speed between the VMs and the vmKernel Ports but I'm assuming its the same.

Am I missing something or should the speed be better on the VM running within the HA Datastore? Please let me know of whatever I can try to improve this performance. There is only 1 VM running on this HA Datastore and I fear as we add more the performance will continue to degrade...

Thanks in advance for any help and let me know if you have any questions with what I state above!

Jonathan VDZ
crafzman
Posts: 4
Joined: Fri Nov 21, 2014 11:04 pm

Sun Dec 14, 2014 12:05 am

I have been testing further and it looks like the issues may be due to my L1 cache being set a 128MB. I've created a new device and found in another Starwind guide the recommendation to use 1GB of L1 cache per 1TB. The performance is better. I will test more and report the results.
thefinkster
Posts: 46
Joined: Thu Sep 18, 2014 7:15 pm

Mon Dec 15, 2014 1:55 pm

Also, if you turn on Level 2 cache, disable it by commenting it out of the configuration. Level 2 cache has weird performance issues (some times it's near 100%, other times I get only 25%).
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Dec 15, 2014 3:34 pm

A lot of things depend on the workload. If cache has nothing to cache then enabling it will slow down the performance. In any case to troubleshoot the issue support will ask you to turn both L1 and L2 completely to see how underlying storage can go with NO caching enabled (or 100% cache miss situation) which is a worst case scenario.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
crafzman
Posts: 4
Joined: Fri Nov 21, 2014 11:04 pm

Mon Dec 15, 2014 6:52 pm

thefinkster wrote:Also, if you turn on Level 2 cache, disable it by commenting it out of the configuration. Level 2 cache has weird performance issues (some times it's near 100%, other times I get only 25%).
Can you tell me where exactly the configuration file is and what the process is for making these changes (particular services that need to be restarted, etc?)

Thanks,
JV
crafzman
Posts: 4
Joined: Fri Nov 21, 2014 11:04 pm

Mon Dec 15, 2014 6:59 pm

anton (staff) wrote:A lot of things depend on the workload. If cache has nothing to cache then enabling it will slow down the performance. In any case to troubleshoot the issue support will ask you to turn both L1 and L2 completely to see how underlying storage can go with NO caching enabled (or 100% cache miss situation) which is a worst case scenario.
Can you give me an description of what when nothing would be cached? As of now I am not using an L2 cache. just an L1. I would like to increase or eliminate my L1 cache without needing to recreate my device. Is that also done through a config file as thefinsker mentions above?

I have a call with an engineer tomorrow so I will see what information they can provide.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Dec 19, 2014 4:57 pm

crafzman wrote:
thefinkster wrote:Also, if you turn on Level 2 cache, disable it by commenting it out of the configuration. Level 2 cache has weird performance issues (some times it's near 100%, other times I get only 25%).
Can you tell me where exactly the configuration file is and what the process is for making these changes (particular services that need to be restarted, etc?)

Thanks,
JV
You are running everything from test environment, so I think it is a better idea to create new targets without HA, instead of messing with the configuration files, which may be risky.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Dec 19, 2014 5:01 pm

crafzman wrote:
anton (staff) wrote:A lot of things depend on the workload. If cache has nothing to cache then enabling it will slow down the performance. In any case to troubleshoot the issue support will ask you to turn both L1 and L2 completely to see how underlying storage can go with NO caching enabled (or 100% cache miss situation) which is a worst case scenario.
Can you give me a description of what, when nothing would be cached? As of now I am not using an L2 cache. just an L1. I would like to increase or eliminate my L1 cache without needing to recreate my device. Is that also done through a config file as thefinsker mentions above?

I have a call with an engineer tomorrow so I will see what information they can provide.
OK, please refer to the "How can I change the cache settings?" in our FAQ:
http://www.starwindsoftware.com/starwind-faq

Please disable the L1/L2 cache on all the StarWind nodes and give it a try again. If nothing will help we will ask you to run the tests mentioned in our Benchmarking Guide:
http://www.starwindsoftware.com/starwin ... ice-manual

I hope that helped
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply