Random occuring connection resets...

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Sun Jan 13, 2013 3:22 pm

Hi there,
I'm using the free Version of StarWind iSCSI SAN in my "Home-Lab".
Target Software is StarWind iSCSI SAN v6.0.0(Build 20121115, [SwSAN], Win64) on 2008R2,
Initiator is a Windows Server 2012 (MS Initiator-SW).
The Network used for attaching the Initiator to the Target is Infiniband(old 10gbit stuff) with IPoIB.
The Initiator(machine) is a Hypervisor(Hyper-V) and SMB-FS.
As long as i don't get the errrors described later, it's a very nice "thing".
I can reach Speeds up to 800MB/s(benched with atto) on the hypervisor, and all VM's are running smoothly.
(only ~80MB/s read or write inside a single vm, but thats another "building site"...)
Sometimes (often) when I produce some load i get something like a "lag". (For example if i'm unzipping a file on the smb-volume)

When I look at the event-log on the hypervisor i can find:
Error: Source: iScsiPrt Event-ID: 7 : Initiator couldn't send a iSCSI-PDU(i've translated this error msg from german).

When I look at the StarWind-Event log i can find(it's not ordered by time, just examples):
LIN: recvData returned 10054
LIN: *** 'recv' thread: recv failed 10058.
LIN: WSASend() returned 10054!

,and stuff like that.
I'm not sure if it's in correlation, but it looks like.
I read a thread NOT to include logs, but I think(hope) those small extracts from the logs are OK, and help to clarify things.
I don't know whats going on there, and how to debug it. I would be happy about some help.

Kind regards

EDIT:
Added logile
Attachments
starwind_log.rar
Logfile compressed with winrar
(42.19 KiB) Downloaded 1270 times
Last edited by qwertz on Sun Jan 13, 2013 8:19 pm, edited 1 time in total.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Jan 13, 2013 5:53 pm

You need to attach logs to your messages or send them zipped to support@starwindsoftware.com and not copypasta into message bodies that's the point.

Looks like TCP connection gets disconnected for some reason. Need logs to figure out why. Also can you run some stress test over TCP connection (IPerf or NTtcp with a heavy load) for some time to see would you get random disconnects or performance drops as well.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Sun Jan 13, 2013 8:31 pm

Hi there,
thanks for your quick reply! I've attached the compressed logfiles to the initial post.
I've changed the names and addresses in there, as well the license owner.
I'm going to perform some stress tests tomorrow evening, thanks for your help!
Have a nice evening.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Jan 13, 2013 9:25 pm

Thank you! Please give us some time to investigate the issue.
qwertz wrote:Hi there,
thanks for your quick reply! I've attached the compressed logfiles to the initial post.
I've changed the names and addresses in there, as well the license owner.
I'm going to perform some stress tests tomorrow evening, thanks for your help!
Have a nice evening.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Mon Jan 14, 2013 9:04 am

So far it looks like a network problem. StarWind logfile doesn't seem to contain anything useful (recv 10058 errors are OK, these can be seen in any SW logfile)
For now I would like to wait for the Link stability results and continue from there.
try modifying these settings of IPerf:
Client
iperf.exe -c 192.168.60.224 --port 911 --parallel 16 -w 256K -l 64K -t
server
iperf.exe -s --port 911 -w 512K
Test should be Performed in both directions (in some scenarios the performance problem appears only in one direction)
Max Kolomyeytsev
StarWind Software
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Tue Jan 15, 2013 8:41 am

Hi there,
sorry I didn't manage to test the network yesterday. Let's hope there is some spare time today.
I'll post results as soon as i've done the testing.
Kind regards!
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jan 18, 2013 9:10 am

Any news? Did you manage to run performance & stability tests?
qwertz wrote:Hi there,
sorry I didn't manage to test the network yesterday. Let's hope there is some spare time today.
I'll post results as soon as i've done the testing.
Kind regards!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Sat Jan 19, 2013 3:08 pm

Nope, Sorry!
I'd like to, but I can't. There are a few virtual machines running on the HV, which i can't stop atm.
Next friday is a deadline for a project, and until then the vms are needed for development.
Also, I ordered some spare parts to change hw. (additional hba, additional cable)
Just in case if some hardware is broken, I'd like to figure out and change it in this "scheduled downtime".
They're currently on their way from the usa to germany...
Kind regards, and again, thanks for your help!
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Jan 21, 2013 10:00 am

NP, take your time!

PS Not sure why cannot bring another physical hosts to cluster, move VMs there and put current node down. I though it's one of the
ideas behind having hypervisor VM cluster :)
qwertz wrote:Nope, Sorry!
I'd like to, but I can't. There are a few virtual machines running on the HV, which i can't stop atm.
Next friday is a deadline for a project, and until then the vms are needed for development.
Also, I ordered some spare parts to change hw. (additional hba, additional cable)
Just in case if some hardware is broken, I'd like to figure out and change it in this "scheduled downtime".
They're currently on their way from the usa to germany...
Kind regards, and again, thanks for your help!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Thu Jan 24, 2013 12:06 pm

You are absolutely right ;) (That's where i want to get)
But right now i have only 2 IB-Cables... So there is no cable left to connect the second hypervisor ;)
I ordered one 2 weeks ago, can't take much longer... hopefully.
Also, I've only got one Storage Machine. And we are going to stresstest the only connection to it. If this connection fails, no hypervisor is going to be able to serve any vm ;)
"Fails" ... more than those resets; they haven't had any impact on the vm stability. MPIO would also be great, but... it's my home lab, self funded...and I'm a student!

Greetings from Germany!
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jan 24, 2013 12:16 pm

Nice home lab. Please keep us updated when you'll have your cables.
qwertz wrote:You are absolutely right ;) (That's where i want to get)
But right now i have only 2 IB-Cables... So there is no cable left to connect the second hypervisor ;)
I ordered one 2 weeks ago, can't take much longer... hopefully.
Also, I've only got one Storage Machine. And we are going to stresstest the only connection to it. If this connection fails, no hypervisor is going to be able to serve any vm ;)
"Fails" ... more than those resets; they haven't had any impact on the vm stability. MPIO would also be great, but... it's my home lab, self funded...and I'm a student!

Greetings from Germany!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Mon Feb 11, 2013 1:23 pm

Hi there!
Finally... I've received the spare parts and I'm beginning to test the network stability.
Sorry for the long delay, I'm also wrinting exams atm.
I did some quick tests in both directons (-t 60) and received:

from sanbox(client) to hv(server):
3.84 Gbits/sec with the settings:
"iperf.exe -c 192.168.200.200 --port 911 --parallel 16 -w 256K -l 64K -t 60"

from hv(client) to sanbox(server):
5.88 Gbits/sec with the settings:
"iperf.exe -c 192.168.200.100 --port 911 --parallel 16 -w 256K -l 64K -t 60"

I'm a bit irritated that i've got less throughput than benched with atto on the ramdisk...
But, what would be a representative time for the tests with iperf?

Edit:
Oh, forget to mention the server was started with the param:
"iperf.exe -s --port 911 -w 512k"
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Wed Feb 13, 2013 10:40 am

Hi qwertz,
That's strange,
I mean if the RAM disk results are higher (by the way, what are the latest results you've got?) then there should be a correspondent increase in iperf results.
Is there any chance you could change the --parallel to 8?
Max Kolomyeytsev
StarWind Software
qwertz
Posts: 38
Joined: Wed Dec 12, 2012 3:47 pm

Sat Feb 16, 2013 11:43 am

Hi there,
today I found some time to do a couple of benchmarks.
First i did some tests with iperf (--parallel 8 and 16; -w 256 and 512)
The results were more or less the same than reported some days ago. So, they seem "legit".
To verify those results, I've created a 2gb ramdisk and benchmarked it via atto on hv1.
I was quiet astonished about the values..., but now they correlate(more or less) with those from iperf.
read max ~564MB/s -->~ 4,5 gbit/s (sanboxx is sending)
write max ~718MB/s -->~ 5,7 gbit/s (hv is sending)
I'm sorry, the last atto benchmark did show higher results. I think because there was no other load on the sanboxx.
As it's my home lab, I am trying to optimize power-consumption. Meaning I am using Stuff like C-States... but recently I did a small research on that.
I've found "some" articles about problems with hyper-v and c-states.
http://workinghardinit.wordpress.com/tag/c-states/

And even one mentioning problems with IPoIB:
http://www.janssenjones.com/blog/2011/5 ... tates.html

... :idea: ...
Now i'm going to shut everything down and turn them(c-states) off, and give it a try...
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Mar 13, 2013 6:09 pm

Any news with the most recent mods?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply