Slow synching, but fast mirroring?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
lvp1138
Posts: 9
Joined: Mon Aug 08, 2011 5:19 am

Thu Aug 25, 2011 10:05 pm

Hi guys,

With the latest build 5.17.1733, I have created a 3.5TB HA setup for testing performance, reliability, etc, with a 20GB write-back cache. Both servers are 24 drives (however the test target is on a 10 drive RAID6 disk), with a gig sync channel and a gig H/B. All recommended registry tweaks done. Server is Windows 2008 R2 Enterprise with all the latest updates installed. Server is just for testing at this moment; there is no other iscsi traffic in it.

There are some issues I have noticed during this testing. Here they are listed:

- While the HA setup is fine (both main and partner online and synchronized), when I do changes on one disk (mounted via MS iscsi initiator), I can see the sync channel saturated to 1gbp/s as Starwind syncs the changes to the partner. However, after simulating failovers (by abruptly killing the server itself, or the service), sometimes, when the partner goes into "Synchronizing", the sync process is "slow". By looking at Windows' Resource monitor, the sync channel barely goes above 20-30mbps. Why such a difference? Why won't it saturate the sync channel?

- Usually stopping the Starwind service on the partner fails; Windows usually times out trying to stop it. When I do that, I lose connection on the Starwind console and can not reconnect again. It seems however that Starwind is still running, since I can see disk activity on BOTH servers when changes are done in the test target.

- Sometimes, shutting down Windows, it takes Windows 15-20 minutes to shut down the Starwind service. This issue, and the one above, seem to only happen on the partner side.

- After you successfully forcibly remove a device or target, I can not delete the .img file itself, as Windows reports that the file is locked by Starwind. I have waited hours and it seems Starwind never releases the file. This happens randomly though.

Has anyone else seen similar issues?

Peter :)
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Aug 26, 2011 10:49 am

1) B/c if we'd take ALL the resources StarWind target itself would get unresponsive and for example ESX connections would start raising timeouts. System with one node down should have degradated performance (obviously) but not limited functionality.

2) StarWind does not stop immediately. We need to perform a long sequence of actions (flush cache and write consistancy points for example) before we'll be able to go down in a graceful way.

3) The same as 2)

4) And this looks like a bug. We'll double check this out.

Thank you!
lvp1138 wrote:Hi guys,

With the latest build 5.17.1733, I have created a 3.5TB HA setup for testing performance, reliability, etc, with a 20GB write-back cache. Both servers are 24 drives (however the test target is on a 10 drive RAID6 disk), with a gig sync channel and a gig H/B. All recommended registry tweaks done. Server is Windows 2008 R2 Enterprise with all the latest updates installed. Server is just for testing at this moment; there is no other iscsi traffic in it.

There are some issues I have noticed during this testing. Here they are listed:

- While the HA setup is fine (both main and partner online and synchronized), when I do changes on one disk (mounted via MS iscsi initiator), I can see the sync channel saturated to 1gbp/s as Starwind syncs the changes to the partner. However, after simulating failovers (by abruptly killing the server itself, or the service), sometimes, when the partner goes into "Synchronizing", the sync process is "slow". By looking at Windows' Resource monitor, the sync channel barely goes above 20-30mbps. Why such a difference? Why won't it saturate the sync channel?

- Usually stopping the Starwind service on the partner fails; Windows usually times out trying to stop it. When I do that, I lose connection on the Starwind console and can not reconnect again. It seems however that Starwind is still running, since I can see disk activity on BOTH servers when changes are done in the test target.

- Sometimes, shutting down Windows, it takes Windows 15-20 minutes to shut down the Starwind service. This issue, and the one above, seem to only happen on the partner side.

- After you successfully forcibly remove a device or target, I can not delete the .img file itself, as Windows reports that the file is locked by Starwind. I have waited hours and it seems Starwind never releases the file. This happens randomly though.

Has anyone else seen similar issues?

Peter :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
Alex (staff)
Staff
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Fri Aug 26, 2011 1:37 pm

Peter,
As for the 4) - we have a problem with deleting device that has active connections.
It will be fixed in 5.8 or in upcoming 5.7 update.
Best regards,
Alexey.
lvp1138
Posts: 9
Joined: Mon Aug 08, 2011 5:19 am

Fri Aug 26, 2011 3:10 pm

Thanks for the reply Alex. So, this is normal behavior then?

I would prefer to have both nodes in sync faster. Any way to tune it? In the case of our hardware, it can easily handle syncing at a faster speed.

Right now, at around 30 megabits per second, it is only 3% synced, and it has been more than a day. Thus, it will probably take around 30 days to finish syncing. And this server will have 16TBs of total storage, not the 3.5TBs I am testing with right now, making it even slower when this happens.

It's too risky to wait that long. Something could happen to the active server in that time period. Any suggestions? I don't remember 5.6 being so slow at syncing...

Thank you :)
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sat Aug 27, 2011 9:04 pm

No it's not normal. It's known minor issue for V5.7 and we're not going to back-port the fix from V5.8 but V5.8 itself is immune to this issue by design.

It's too slow. Synchroniziation should take ~50% of network bandwidth to keep targets free to serve incoming requests. Is there any chance we could jump on your system remotely to check what's wrong with it?
lvp1138 wrote:Thanks for the reply Alex. So, this is normal behavior then?

I would prefer to have both nodes in sync faster. Any way to tune it? In the case of our hardware, it can easily handle syncing at a faster speed.

Right now, at around 30 megabits per second, it is only 3% synced, and it has been more than a day. Thus, it will probably take around 30 days to finish syncing. And this server will have 16TBs of total storage, not the 3.5TBs I am testing with right now, making it even slower when this happens.

It's too risky to wait that long. Something could happen to the active server in that time period. Any suggestions? I don't remember 5.6 being so slow at syncing...

Thank you :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
lvp1138
Posts: 9
Joined: Mon Aug 08, 2011 5:19 am

Sun Aug 28, 2011 3:03 am

Will PM the login info :)
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Aug 29, 2011 1:46 pm

We`ll wait :D
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
lvp1138
Posts: 9
Joined: Mon Aug 08, 2011 5:19 am

Mon Aug 29, 2011 3:35 pm

I already PM that info to you...
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Aug 30, 2011 2:31 pm

That was me and my fault - I just forwarded your account info to support staff for further processing. Sorry for delay - were moving to a new bigger office place :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
lvp1138
Posts: 9
Joined: Mon Aug 08, 2011 5:19 am

Tue Aug 30, 2011 4:28 pm

No problem :)
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Aug 30, 2011 10:06 pm

Thank you!
lvp1138 wrote:No problem :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply