horrible slow connection to ESX

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Wed Nov 16, 2011 9:24 pm

Hi

I installed starwind at home, everything on VMware Workstation sitting on Windows7. So i run inside this, virtualized ESX5i and another VM with Win2003 and free version of StarWind. Everything was fine, i was getting copy speeds around 30-50mb/s which is more than ok because everything is on one desktop HDD with additional virtualization layers.

Then i installed the same configuration but on physical hardware, on Dell T310 (ESX5i) and Dell SC440 (Windows 2003R2 with Starwind).

The big problem: When i run ghettoVCB to the iSCSI target, i get around 3MB/s. Yes thats right THREE.
Also with dd /dev/zero and 1mb chunks it takes around 10 times longer than on my test environment.

I played around with NIC settings on Windows (Broadcom, seems to be no Jumbo available), no difference.
I disabled TCP Ack Delay on both ESX and Windows and also disabled Chimney and RSS on Windows but it made no difference.

In both servers are Broadcom NICs and they are directly connected to each other by CAT6 cable.

At vmkernel.log i dont see any errors.

Looking at the graphs it seems to bounce a lot in terms of speed.

I dont know even where to look next ...

Or is it just a broken NIC or cable?
Attachments
iscsiesx.jpg
iscsiesx.jpg (112.4 KiB) Viewed 19718 times
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Nov 16, 2011 9:51 pm

Do you happen to have any raw TCP peformance charts as well?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Wed Nov 16, 2011 10:15 pm

Hello Anton

the only thing i found is NIC statistics

Also i found that this NIC shares same IRQ (16) as SAS controller

Thomas
Attachments
broadcom.jpg
broadcom.jpg (76.95 KiB) Viewed 19709 times
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Nov 16, 2011 10:26 pm

NIC stats don't help much (except we know for sure there are no transmit errros / collisions). Also it's absolutely OK to share IRQ for PCI/PCI-X/PCIe devices. What you need to run is some TCP tests from the place you experience slow network results. NTtcp or / and IPerf. Is it possible?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Wed Nov 16, 2011 11:20 pm

Another strange thing i just found, if i copy a very big file from the VM to Windows over CIFS, it starts fast but after a couple of seconds it drops down to 300kb/s. But this are completely different NICs on a switch.

I only have windows clients and ESX, so i ran iperf between windows vm and windows machine with starwind. Got around 550mbit/s. But how this will test iSCSI network?

Also ran Atto on the old server, everything ok.

Maybe its just network cables?
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Thu Nov 17, 2011 5:53 pm

I further tracked down that both on CIFS and iSCSI the download was getting worse but upload was fine. So checked in performance monitor and saw that the disk queue length was still at 100% when it was copying only 300kb/s.

So it seems i have found the problem at controller or disks :?
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Nov 17, 2011 5:58 pm

Using another disk solves the issue? If it's SATA disk what does SMART monitor says about I/O errors?
ghost wrote:I further tracked down that both on CIFS and iSCSI the download was getting worse but upload was fine. So checked in performance monitor and saw that the disk queue length was still at 100% when it was copying only 300kb/s.

So it seems i have found the problem at controller or disks :?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Thu Nov 17, 2011 7:05 pm

Did not try other disk because i have none.
These are 2 brand new SATA Disks on SAS Dell PERC 5/iR Controller.

When i run atto alone with 2gb size, everything is ok. But as soon there is also traffic on network which involves disk writes, it gets so slow.

That probably also explains why NFS Server on Windows was so slow (2MB/s)
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Nov 17, 2011 9:26 pm

Can you turn ACPI off and play with moving hard disk and network controllers to other IRQs? Or use another NIC for now?
ghost wrote:Did not try other disk because i have none.
These are 2 brand new SATA Disks on SAS Dell PERC 5/iR Controller.

When i run atto alone with 2gb size, everything is ok. But as soon there is also traffic on network which involves disk writes, it gets so slow.

That probably also explains why NFS Server on Windows was so slow (2MB/s)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Mon Nov 21, 2011 3:24 pm

I found out that disk cache was disabled, but this controller has also no BBU. After enabling the cache, i didnt had anymore slowdowns at writing.
It could be still faster, 20mb/s over iSCSI but better than 3mb/s before.

What is strange, that atto had no slowdowns. But as copied a big file locally it was also slowing down.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Nov 21, 2011 3:27 pm

20 MB/sec still sucks.

Synthetic tests have own issues. Make sure you run a bunch of them (add I/O Meter, SQLIO etc) to pinpoint the numbers.
ghost wrote:I found out that disk cache was disabled, but this controller has also no BBU. After enabling the cache, i didnt had anymore slowdowns at writing.
It could be still faster, 20mb/s over iSCSI but better than 3mb/s before.

What is strange, that atto had no slowdowns. But as copied a big file locally it was also slowing down.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Fri Dec 02, 2011 10:18 am

what is strange, the speed varies a lot. One time it took 800mins for an 140gb snapshot and next time it took only 60mins.

And when its sort of slow i can see NIC copy speed is going up/down all the time from 10-20MB/s

Does this still sound like an IRQ issue?

With tests i am a bit limited because only ESX has access to that storage.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Dec 02, 2011 12:10 pm

I would blame hardware. Can you put NIC into other slot?
ghost wrote:what is strange, the speed varies a lot. One time it took 800mins for an 140gb snapshot and next time it took only 60mins.

And when its sort of slow i can see NIC copy speed is going up/down all the time from 10-20MB/s

Does this still sound like an IRQ issue?

With tests i am a bit limited because only ESX has access to that storage.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ghost
Posts: 11
Joined: Wed Nov 16, 2011 9:11 pm

Fri Dec 02, 2011 4:48 pm

i could if i would be physically there ...

i guess i will swap cables between NICs next time and make the other nic for iSCSI, i got some feeling the nic is defective
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Dec 02, 2011 10:34 pm

Yes, please use other route for iSCSI (as many components different as possible) and let us know. Thank you!
ghost wrote:i could if i would be physically there ...

i guess i will swap cables between NICs next time and make the other nic for iSCSI, i got some feeling the nic is defective
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply