Software-based VM-centric and flash-friendly VM storage + free version
Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)
-
hoba
- Posts: 28
- Joined: Mon Sep 18, 2017 6:44 am
Wed Oct 04, 2017 8:03 am
Hi Sergey,
I ran the iperf tests. I can get 10 gbit/s between all 10gig interfaces up and down between the vsan nodes. l also can push 10 gbit/s from the vmware server to the interfaces of the vsan nodes. However I only get around 4 gbit/s from any of the vsan nodes towards the vmware server. Sounds like some kind of settingproblem on the vmware server? Do you have any advice here? The nics are intel x540-AT2:
Code: Select all
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic1 0000:03:00.1 ixgbe Up 10000Mbps Full 0c:c4:7a:df:0f:7f 9000 Intel(R) Ethernet Controller X540-AT2
The changes in the starwind.cfg had no effect. I still see massively degraded performance when adding more than 2 MPIO paths.
Thanks
Holger
-
Delo123
- Posts: 30
- Joined: Wed Feb 12, 2014 5:15 pm
Wed Oct 11, 2017 6:18 am
Any updates on this? Never trusted those X540-(A)T2. I remember some very old driver seemed to work much better on vmware side, but can't remember exactly when this was...
-
hoba
- Posts: 28
- Joined: Mon Sep 18, 2017 6:44 am
Wed Oct 11, 2017 6:43 am
I have created a ticket with vmware. The driver is not developed, nor supported by intel. Vmware is creating and compiling it on their own from the intel opensource code. I'll let you know, once they have looked into it and hopefully have a solution. Not sure if the MPIO-problem (issues when adding more than 4 paths) is related too but I'll let them check that one too.
-
Delo123
- Posts: 30
- Joined: Wed Feb 12, 2014 5:15 pm
Wed Oct 11, 2017 6:50 am
Ok, thanks... As said I remember there where some issues with these Nics, so we abandoned them for exactly the reason you are referring to.
Maybe the lun is trying to use both vSan nodes as active optimized paths? Idealy only 1 node should be active optimized for a single lun, no? Not sure where to check that in recent builds
-
hoba
- Posts: 28
- Joined: Mon Sep 18, 2017 6:44 am
Thu Dec 21, 2017 4:38 pm
Hi,
after my ticket at vmware was lost on a desk of an employee who went on vacation and I went through several steps of escalation, I at least got a hint from the next support technician. Performance is now actually fine. It all comes down to not configuring 2 vmkernel interfaces on the same vswitch hitting a single physical nic. I was doing this, to be able to use mpio with all 4 paths and even 2 paths if one of the vsan-nodes fails. After adding another NIC to the vmware server and putting a vmkernel interface on each of the nics I see decent performance, though I wonder why I see writes > 10 gibt/s but reads are maxed out at 10 gbit/s. However, if I run Tests from 2 vmware servers against 2 san nodes I see the full performance on both nodes (no degration while running diskbenchmarks on both vmwareservers inside VMs with diskfiles hosted on the vsan).
However during migration I will have to setup a single vsan node first without HA and add the HA-Node later with syncing the data from the first non HA existing node. I'm going to add another post for this, as this is not related.
Thanks so far!
Holger
-
Boris (staff)
- Staff
- Posts: 805
- Joined: Fri Jul 28, 2017 8:18 am
Thu Dec 21, 2017 5:39 pm
Holger,
thank you for getting back to this matter. At least now we see it was misconfiguration in terms of VMware setup.
-
Mitmont
- Posts: 4
- Joined: Thu Sep 29, 2016 6:22 pm
Fri Jan 05, 2018 7:58 pm
Hello this thread. If I may add to this. In our case 2012R2 with UI, Dell servers with high throughput NICs. What we came across was an issue when we hit a certain number of VMs we started to receive tons of Excessive Flush Cache messages. We saw this using WireShark on one of the members. We saw it in 2012R2 Clustered Storage Spaces also but didn't know how to handle them back in 2014. We did come across a fix that resolved our issue. There's an article by Ard-Jan Barnas called Tuning Windows 2012 -File System Part 1. We choose to make the following change per that article. TreatHostAsStableStorage. that was the biggest one. We also found some other specific tuning for our environment that we continue to deploy each time we create a new SW HA node. I'm adding that this is what worked for us... it may or may not be a correct action for every situation. Proceed at your own risk.