ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Mon May 23, 2011 5:57 pm

We have an interesting issue and a support call has been opened with VMware, but I would like some input from Starwind also.

We run 3 ESXi 4.1 Enterprise servers with 4 1gbe for SAN, using vmware's iscsi software initiator
Starwind on W2K8R2 on Dell hardware with 2 10gbe for SAN (each nic has 2 ip addresses)
4 networks for SAN (10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24)
MPIO with round robin on vmware enabled

This is my rr script in rc.local

Code: Select all

esxcli nmp device list | grep ^eui |
while read device ; do
        esxcli nmp device setpolicy --psp VMW_PSP_RR --device ${device}
        esxcli nmp roundrobin setconfig --type "iops" --iops 3 --device ${device}
done
In vmware under dynamic discovery I put the 4 IP addresses of the Starwind storage server (10.0.0.220, 10.0.1.220, 10.0.2.220, 10.0.3.220)

Do a rescan all and view the messages log on the vmware server and I notice that each vmware vmkernel is trying to connect to each target whether it is part of that network or not.

Now this takes time a long time in the 30 minutes + depending on the number of targets and the number of vmkernels.

Wondering if other Starwind/vmware customers encountered the same issues?
Constantin (staff)

Sun May 29, 2011 3:12 pm

It`s OK, that VMKernel tries to reach all available IP addresses. For higher redundancy I would rather recommend you following:
a) run all 4 VMKernel and StarWind NICs in one subnet
b) Manually assign VMNICs to SW iSCSI HBA using ESX(i) CLI.
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Sun May 29, 2011 4:54 pm

Constantin (staff) wrote:It`s OK, that VMKernel tries to reach all available IP addresses. For higher redundancy I would rather recommend you following:
a) run all 4 VMKernel and StarWind NICs in one subnet
b) Manually assign VMNICs to SW iSCSI HBA using ESX(i) CLI.
I am already assigning vmnics to swiscsi using the cli...but your other recommendation of running all vmkernels and starwind in one subnet completely contradicts starwind's documentation about setting up mpio and that is the only reason we do that.

Has something changed with starwind's ability to work with mpio on a single subnet? if i only need one subnet I can easily modify the entire structure, but i need exact confirmation from starwind that running 1 subnet for our iscsi and mpio would still work.

Here is starwind's basic how-to on mpio http://www.starwindsoftware.com/images/ ... d_MPIO.pdf

Please advise...

Thanks,
Mark
Constantin (staff)

Sun May 29, 2011 5:39 pm

We`ll update soon our best practices in documentation. This recommendations are based on the recommendations of VMware technical papers.
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Sun May 29, 2011 6:20 pm

Constantin (staff) wrote:We`ll update soon our best practices in documentation. This recommendations are based on my personal experience received during VMware Ready testing.
again...I am not getting any specific responses from starwind, just generalizations that really still keep the customer wondering, since we are a starwind customer, why not just tell me specifically if i need one or multiple subnets for mpio to work?

i can tell you for a fact that equallogic requires only a single subnet for mpio to work, but I cannot tell you that starwind only requires a single subnet because the documentation states multiple subnets :)

let me explain further, we use vmware esxi to mount datastores but we also have vm guests mounting iscsi volumes through microsoft iscsi initiator and connecting directly to starwind.


thanks,
mark
Constantin (staff)

Mon May 30, 2011 4:31 am

Currently I`m out of office so can`t go to our tech department responsible for tech papers and ask them to update to papers, and why.
Also in your case I don`t see any problems too: all VMKernel and VMs too can be in one subnet, it`s not a problem, but you simply should ACL in StarWind for LUN masking - thus you`ll hide datastore targets from VMs and VMs disks from ESX(i).
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Mon May 30, 2011 5:29 pm

Constantin (staff) wrote:Currently I`m out of office so can`t go to our tech department responsible for tech papers and ask them to update to papers, and why.
Also in your case I don`t see any problems too: all VMKernel and VMs too can be in one subnet, it`s not a problem, but you simply should ACL in StarWind for LUN masking - thus you`ll hide datastore targets from VMs and VMs disks from ESX(i).
Hello Constantin,

We use ACL in starwind to allow only certain initiators access to certain targets, but...
ACL in starwind do not work to prevent vmware vmkernels from scanning targets that are on different iscsi networks.
ACL in starwind do work to prevent vmware vmkernels from accessing targets that are restricted.
ACL in starwind do work to prevent other initiators from acceessing targets that are restricted.

I put in a support ticket with vmware wondering why it takes 20-30 minutes for our vmware servers to start and I included a little overview of our iscsi setup

We have 4 vmkernels and 4 subnets for iscsi, I told them that during startup the logs show that the vmkernels are accessing targets in their respective subnet, but also trying to access targets in the other subnets and failing and this is what is causing host server startup delay.

Code: Select all

- vmk1/10.0.0.18 connects to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk1/10.0.0.18 tries to connect to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk1/10.0.0.18 tries to connect to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk1/10.0.0.18 tries to connect to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails

- vmk2/10.0.1.18 connects to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk2/10.0.1.18 tries to connect to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk2/10.0.1.18 tries to connect to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk2/10.0.1.18 tries to connect to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails

- vmk3/10.0.2.18 connects to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk3/10.0.2.18 tries to connect to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk3/10.0.2.18 tries to connect to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk3/10.0.2.18 tries to connect to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails

- vmk4/10.0.3.18 connects to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk4/10.0.3.18 tries to connect to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk4/10.0.3.18 tries to connect to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk4/10.0.3.18 tries to connect to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
So vmware support came back and told me that this is standard when using vm software iscsi inititator and provided the KB below:
http://kb.vmware.com/selfservice/micros ... Id=1024476

So...if we can run mpio with vmware and mpio with vm guests using ms iscsi Initiator with 1 iscsi subnet instead of 4, then I can scale back our network.

When you have time let me know if this is feasible, if not, then we will have to live with the current limitations of starwind requiring multiple subnets for mpio to work and vmware's software iscsi initiator limitations of scanning multiple networks and timing out.

Thanks,
Mark
Constantin (staff)

Mon May 30, 2011 6:03 pm

ACL in starwind do not work to prevent vmware vmkernels from scanning targets that are on different iscsi networks.
ACL in starwind do work to prevent vmware vmkernels from accessing targets that are restricted.
ACL in starwind do work to prevent other initiators from acceessing targets that are restricted.
You can prevent it by using StaticDiscovery instead of DynamicDiscovery. The reason of it - ACLs in StarWind are working on server, but not client side that`s why ACLs aren`t able to prevent rescaning from different subnet.

You can send an email to support to get an answer that we FULLY support configuration with 1 subnet, and even recommend it.
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Mon May 30, 2011 6:16 pm

Constantin (staff) wrote:You can prevent it by using StaticDiscovery instead of DynamicDiscovery. The reason of it - ACLs in StarWind are working on server, but not client side that`s why ACLs aren`t able to prevent rescaning from different subnet.

You can send an email to support to get an answer that we FULLY support configuration with 1 subnet, and even recommend it.
I tested with static discovery, same results, but I will test today using 1 subnet and provide feedback.

Thanks,
Mark
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Mon May 30, 2011 8:08 pm

It works with a single subnet as you recommended...why is this information not posted anywhere?

Starwind needs a real knowledgebase system that is constantly updated. Your techpapers/whitepapers are archaic!

BTW...thank you your recommendation did resolve my problem.

Thanks,
Mark
Constantin (staff)

Tue May 31, 2011 3:51 pm

Anytime, we are hard working on new KB, and if you will have any problems with VMware and StarWind - contact me, my special skill is VMWare. :)
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Wed Jun 01, 2011 3:04 pm

Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.

So we still need multiple subnets to perform MPIO from within operating system initiators.

Thanks,
Mark
kmax
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Wed Jun 01, 2011 11:17 pm

mkaishar wrote:Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.So we still need multiple subnets to perform MPIO from within operating system initiators.
Can you expand on this in regards to Windows iSCSI initiator support and round-robin?
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Wed Jun 01, 2011 11:25 pm

kmax wrote:
mkaishar wrote:Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.So we still need multiple subnets to perform MPIO from within operating system initiators.
Can you expand on this in regards to Windows iSCSI initiator support and round-robin?
The load balance policy when mounting iscsi targets within the OS, is that what you are asking?

rr.png
rr.png (15.18 KiB) Viewed 19403 times
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 02, 2011 8:03 pm

This is very valuable information! Thank you very much for your investigation! Should help to save quite a time to both support staff and StarWind customers.
mkaishar wrote:Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.

So we still need multiple subnets to perform MPIO from within operating system initiators.

Thanks,
Mark
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply