Starwind virtual SAN free - Performanceproblems

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Mon Sep 18, 2017 7:37 am

Hi,

the first time I tried your vSAN I setup 2 VMs with Server 2012r2 Core on the same ESXi and created a pair of HA-Nodes. This worked pretty good right out of the box. Performance was ok on this test-ESXi box. I did some Failover-HA-Tests and everything looked good so far, so I ordered a pair of Supermicroservers to install it on real Hardware. I don't want to have it run hyperconverged but want to have the 2 nodes on real hardware. However the performance is not very good (it's even only less than half of the performance of my virtualized testbed). Maybe someone can point me into the right direction to tune it.

When doing an ATTO-Benchmark on the vsan-server direct, I see reads around 1560 MB/s and writes around 800 MB/s. I created a single Server 2012r2 machine on an ESXi on the vsan-datastore with vmwaretools installed. When I run the ATTO-Benchmark inside this VM I only see about 500 MB/s reads and 470 MB/s writes. When creating another VM and benching on both concurrently I can see that they both in sum don't use more than the 500/470 MB/s. Peaks on the vSAN-NIC show about 4 gbit/s during the tests.

At the moment I only have node 1 in Servive. Node 2 will be added later as I have first to move a datastore over to the vsan to get the hdds from the local datastore of one of the ESXis to the 2nd vsan-node.

Setup Details:
Server (per Node):
- Supermicro X10DRI-T
- 2x Xeon E5-2603v4
- 32 GB RAM
- Avago Megaraid SAS 9361-8i
- 2x 300 GB SAS Raid1 for OS
- 2x 250 GB SSD Raid1 for L2 Cache
- 8x 6 TB SAS Raid 10 for SAN-Storage
- 2x Intel 10gbit/s X540-AT2 (iSCSI and Sync)
- 2x Intel 1 gbit/s CT (Heartbeat and Management)
- Windows Server 2012r2 Core including latest updates
- Drivers and Biosversions of all components on latest version

Switch used for the iSCSI-Segment is a Netgear XS716T. I also attached the ESXi and the vSAN-machine directly whithout the switch in between, but that doesn't change anything. Jumboframes are enabled and I can ping with big payloads in both directions (ESXi and vsan-machine).

The VMWare-Servers are connected via 10 gbit/s Nics. Also tested 2 different ESXis already with different Hardware and different NIC-Vendors, so the problem seems to be with the vsan-machine.

I've already followed all the guides, googled and searched the forum for hints but found nothing to improve the performance yet. Any kind of help is greatly appreciated.

Thanks,
Holger
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Mon Sep 18, 2017 4:03 pm

Hello Holger! Try to do the following:

1) Make sure that StarWind VMs use Thick Provision Eager Zeroed virtual disks.

2) Under Advanced Setting in vSphere client go to Disk Settings and set Disk.DiskMaxIOSize option to 512.

3) Try to remove StarWind L2 cache. It operates in write-through mode only, so it doesn't boost writes at all. You can do it according to this guide:
https://knowledgebase.starwindsoftware. ... dance/661/
Make sure to stop StarWindService before.

I hope this will help.
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Tue Sep 19, 2017 11:09 am

Hi Sergey,

thanks for your quick reply!

1) This was already the case with my tests
2) That setting doesn't seem to make any difference, however I can't reboot the ESXi atm. However this doesn't seem to be necessary regarding to the vmware docs.
3) I already was testing without L2. Actually the numbers in my pevious post have been with disabled L2-Cache already.

I setup the machine completely from scratch now, this time with server 2016, as it is now listed as supported with the latest starwind vsan8. After complete reinstallation I ended up with the same numbers. However I have found out that the performance can be tweaked using the Intel-Proset-Utility. The default template was "Storage-Server". If you change it to "Low-Latency" both numbers (reads and writes go up). If you then disable the default enabled "priority and vlan enabled" you end up with writes of 1172 MB/s and reads of 716 MB/s. So I now can easily saturate the downstream on the 10 Gig vsan-node, however the upstream still is not maxed out, though the system should be able to push that load (CPU, mem, Disks...).

Any other ideas how to push the upstream towards 10 gb/s?

Thank you,
Holger
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Wed Sep 20, 2017 4:03 pm

Thank you for your reply!
According to VMware docs changing Disk.DiskMaxIOSize option to 512 seems to be necessary for some cases:
https://kb.vmware.com/selfservice/micro ... Id=1003469

Sector size should be 512 bytes in case you are using ESXi. You can set it in StarWind Management Console when you create StarWind device.
Additionally, consider adding resources to StarWind VM. Check our System Requirements: https://www.starwindsoftware.com/system-requirements
Also, try to measure network performance with iPerf (is 10Gb channel fully loaded?) and disk performance with diskspd utility (as an alternative to ATTO-Benchmark).
Are you using StarWind L1 cache?
Please, note, StarWind adds additional virtualization layer to your ESXi infrastructure, so 10-15% performance difference is considered acceptable.
Last edited by Sergey (staff) on Thu Sep 21, 2017 4:36 pm, edited 2 times in total.
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Wed Sep 20, 2017 6:31 pm

Hi Sergey,

thank you for your reply.

I'm not using Starwind as VM. It's a baremetalinstall on the server mentioned on my first Post. It's not a hyperconverged setup.

Sectorsize on the starwind device is indeed 512 bytes.

I'm using a Layer1 Cache of 16 GB for the 21 TB starwind device. Layer2 Cache on SSD didn't make any change to the values I see.

I can easily saturate the 10 gbit/s with writes in the vm using the starwind datastore, however the read is only using about 5-6 gbit/s.

Interestingly there someone have almost the same issues like me with a kind of identical setup (intel 10 GB/s nics, AvagoRaidcontroller though I'm using a raid 10 and he's using a raid 5, ...):
https://forums.starwindsoftware.com/vie ... f=5&t=4698

As he has scheduled a supportsession tomorrow with starwind, I hope the solution they come up with will help on my setup too :-)

Regards
Holger
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Thu Sep 21, 2017 3:53 pm

Yes, Holger, the case is very similar. I advise you to add the partner node to your system, create StarWind HA and run the test from the HA device and from 2 VMs simultaneously. Make sure to have a dedicated 10Gb/s network interface, used for synchronization only. Do not use it for iSCSI and Sync simultaneously.

Also, check this guide, starting from the page 44:
https://www.starwindsoftware.com/techni ... Sphere.pdf

These tweaks should help in case you add the partner node.
https://forums.starwindsoftware.com/vie ... 8&start=15

Update for the community:
The read performance should be improved after adding the second node and additional network connections for ISCSI traffic (in case if existing one is fully loaded). Also, take into account that 10-15% performance decrease is considered acceptable because of additional virtualization layer.
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Fri Sep 22, 2017 8:03 am

Hi Sergey,

Thanks for your support!

I have ordered 2 more Dual Intel 10 Gbit/s Nics. I'll try to setup the 2-Node-HA-Cluster next week, when they arrive. I'll then have 2x 10 gbit/s for sync and heartbeat between the Nodes and 2x 10 gbit/s for iSCSI (per node). I'll run the tests again and post results with that setup.

Have a nice weekend,
Holger
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Fri Sep 22, 2017 8:24 am

Thank you, Holger! Wish you a great weekend too :)
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Thu Sep 28, 2017 5:17 pm

Hi Sergey,

I have upgraded the machines with the additinal NICs, like you told me, and I can saturate the 10 gbit/s iscsi link on the vmware-server now for read and write. MPIO is working fine too. Each san server has a dual 10 gbit/s for iSCSI and dual 10 gbit/s for sync. I have an additional 1 gbit/s nic for management and pure heartbeat purpose in there now too. I guess I can start migrating machines from local datastores to the vsan now.

Thanks for the great support!

Btw, I have installed starwind manager beta. Looks very promising! Is there a timeline for the version with vsan-managementfunctionality? Also will it be a free product or will it become paid software, once it is out of beta?

Thanks again,
Holger
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Fri Sep 29, 2017 7:11 am

Hi again,

I moved everything to the rack now and things just got worse for the final test. Writes are fine, but reads drop to 170 MB/s if both nodes are up and in sync. If I shutdown one Node I get the performance That I have seen earlier. Writes are never an Issue, but the reads are still not behaving like expected.

Description of the modified setup:

vsan01:
NIC iSCSI1 10.254.254.1 (10gbit/s)
NIC iSCSI2 10.254.253.1 (10gbit/s)
NIC sync1 10.254.250.1 (10gbit/s)
NIC sync1 10.254.251.1 (10gbit/s)
NIC Management 192.168.181.50 (1gbit/s, also used for Heartbeat only)

vsan02:
NIC iSCSI1 10.254.254.2 (10gbit/s)
NIC iSCSI2 10.254.253.2 (10gbit/s)
NIC sync1 10.254.250.2 (10gbit/s)
NIC sync1 10.254.251.2 (10gbit/s)
NIC Management 192.168.181.50 (1gbit/s, also used for Heartbeat only)

vmwareserver:
vSwitch iSCSI connected to 10 gbit/s dedicated NIC
vkernel1 10.254.254.11
vkernel2 10.254.253.11
Both vkernels on the same vswitch/dedicated NIC as I don't have another 10 gbit/s Interface for this atm
vmwareserver sees 4 paths and shows all as active. Roundrobin is used for MPIO. IOPS limit reduced set to 1.

Everything beside the sync nics are connected to a Netgear XS716T 10 gbit/s Switch. Syncinterfaces are directly patched.

The only thing that has changed from the previous test, ist that I added the additional iSCSI2-Interfaces and the vmkernel2 Interface. I know that I can't really use the 4x 10gbit/s from the 1 vmwareserver, but 2 more will be added, so I'll have 3x10gbit/s from the 3 vmware servers hitting 4x iSCSI 10gibt/s on the 2 SAN-Nodes.

Any recommendations appreciated.

Thanks,
Holger
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Fri Sep 29, 2017 11:33 am

Hello, Holger! Let me give you few bits of advice here:
1. What RAID setting do you use? We recommend the following (for SAS):
RAID 10
Disk cache: Default
Write policy: Write Back
Read policy: Read ahead

2. The results of your previous tests (writes of 1172 MB/s and reads of 716 MB/s) look exaggerated. Could you try to rerun the storage test with DISKSPD utility? You can download it from the attachment to this post. Launch the storage test on 2 VMs simultaneously, then summarize the results. Make sure that tested .vmdk virtual disks are located on StarWind HA Datastore. Also, it makes sense to run the Storage Test on the disk, where StarWind images are located.
Try to connect StarWind disk to your compute nodes using MS iSCSI Initiator. Is there any difference in performance on SW disk? Please, use DISKSPD utility (attached) for all storage tests.

3.What StarWind build do you use?
4. Run iPerf test to make sure that networks interfaces are ok.
Attachments
StorageTest_v0.8.zip
(95.42 KiB) Downloaded 3596 times
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Fri Sep 29, 2017 1:49 pm

Just a short update on this:

If I disable the additional 10.254.253.x iSCSI Paths on the VMWare-Server I'm getting 1172 mb/s writes and 1052 mb/s reads. It's not a Problem of the tool that I'm using for measuring the performance. It seems to be a problem with MPIO of some kind...
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Fri Sep 29, 2017 4:54 pm

What is the version of StarWind VSAN? I kindly advise you running iPerf test on both iSCSI network interfaces.
hoba
Posts: 28
Joined: Mon Sep 18, 2017 6:44 am

Fri Sep 29, 2017 8:00 pm

Can you please tell me the syntax of the iperfs that you want to see?

The Buildnumber and Version is:
StarWind Virtual SAN v8.0.0 (Build 11456, [SwSAN], Win64)
Built Jul 27 2017 18:00:03

Even more Oservations:
If I use 2x 10 gbit/s Nics on vsan01 only I get decent performance.
If I use 2x 10 gbit/s Nics on vsan02 only I get decent performance.
If Use 1x 10 gbit/s Nic of vsan01 and 1x 10gbit/s Nic of vsan 02 I get decent performance, no matter which of the Interfaces I use.
If I use 2x 10 gbit/s Nics of vsan01 und 2x 10 gbit/s Nics of vsan02 concurrently the reads go down the drain.

Looks like the Problems start, when I use more then 2 MPIO Paths.

Thanks for your continued support!
Holger
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Mon Oct 02, 2017 10:13 am

Dear Holger, you can set the server with this command: .\iperf.exe -s
You can use this command for the client: .\iperf.exe -c 172.16.10.2 -t 150 -i 15
Just replace the IP address with the correct one.
Locate the StarWind Virtual SAN configuration file and open it using Notepad. The default path is:
C:\Program Files\StarWind Software\StarWind\StarWind.cfg
Find the string <iScsiDiscoveryListInterfaces value=»0»/> and change the value to 1 (should look as follows: <iScsiDiscoveryListInterfaces value=»1»/>). Save the changes and
exit Notepad. Note that if there are any issues with saving the recently changed *.cfg file, launch Notepad with Administrator rights and then load the starwind.cfg file to make modifications.
Restart StarWind Virtual SAN Service after the *.cfg modification.
Post Reply