The replication partner is not synchronised

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
ehinkle29
Posts: 6
Joined: Thu Mar 13, 2014 8:34 pm

Tue May 27, 2025 12:27 pm

I am getting an error that one of my luns is not able to sync with the replication partner. If I look via the web gui i get the following error for both nodes. I am running a two node cluster using virtual machines.

The replication partner is not synchronised.

If I open the SW management console, it shows on node 2, Device is not synchronized. On Node 1 it shows Partner 'iqn.2008....' is not synchronized. Below are some errors from the two nodes. I am thinking something is going on with node1. I shutdown node1 thinking it should have worked with just node 2 but devices using lun 1 still had issues. I suspect something happened with node1 disk, although I have ran fschk on the system. How would I be able to take lun1 offline for say node 1 and force a replication from node2 if it is a problem. Just thinking if I had a disk failure on a node it should run fine on the other node till I shutdown and replaced the drive and resynch.

Node 1
5/27 12:17:08.214853 59 Srv: TelnetListener::listenConnections: Accepted control connection from 127.0.0.1:41290.
5/27 12:17:08.215073 e3 Srv: ControlConnection::processConnection: start processing 127.0.0.1:41290
5/27 12:17:08.215679 e3 FileBrowser: CFileBrowser::command: Return param: 'DiskSize' = '30837178368'
5/27 12:17:08.215859 e3 FileBrowser: CFileBrowser::command: Return param: 'FreeSpace' = '19769409536'
5/27 12:17:08.216242 59 Srv: TelnetListener::listenConnections: Accepted control connection from 127.0.0.1:41294.
5/27 12:17:08.216468 e4 Srv: ControlConnection::processConnection: start processing 127.0.0.1:41294
5/27 12:17:08.218559 e4 Srv: ControlConnection::processConnection: finish 127.0.0.1:41294
5/27 12:17:08.218570 e4 Srv: ControlConnection::stop: 127.0.0.1:41294
5/27 12:17:08.219110 e3 FileBrowser: CFileBrowser::command: Return param: 'DiskSize' = 'Failed: Specified file can not be opened'
5/27 12:17:08.219355 e3 FileBrowser: CFileBrowser::command: Return param: 'DiskSize' = 'Failed: Specified file can not be opened'
5/27 12:17:08.219474 e3 Srv: ControlConnection::processConnection: finish 127.0.0.1:41290
5/27 12:17:08.219484 e3 Srv: ControlConnection::stop: 127.0.0.1:41290

node 2 I am seeing this.
5/27 12:20:04.794893 d4 Tgt: *** iScsiTarget::openSession: iqn.2008-08.com.starwindsoftware:10.35.15.15-lun1: can't register session. The device 'HAImage2' is not ready.
5/27 12:20:04.794897 d4 T[f2a7,1]: ***iScsiTask::startLoginPhase: *ERROR* Login request: device open failed.
5/27 12:20:04.795084 1c4 C[f2a7], IN_LOGIN: iScsiConnection::doTransition: Event - LOGIN_REJECT.
5/27 12:20:04.795422 d4 C[f2a7], IN_LOGIN: iScsiConnection::recvData: Recv - peer shutdown
5/27 12:20:04.795453 d4 C[f2a7], IN_LOGIN: iScsiConnection::receive: recvData returned error 10058 (0x274a)!
5/27 12:20:04.795700 1c4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.795808 d4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.795970 d4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.796051 d4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.796186 1e6 S[f2a7]: iScsiSession::~iScsiSession: ~Session

How can I get this lun1 back working?
yaroslav (staff)
Staff
Posts: 3598
Joined: Mon Nov 18, 2019 11:11 am

Tue May 27, 2025 12:48 pm

Did you get it while creating LUNs?
Please see the MTU values and set them to 1514.
Also, to force synchronization, try using manual synchronization. Use SyncHaDevice.ps1, comment $device.Synchronize([SwHaSyncType]::SW_HA_SYNC_FULL, "") and uncomment #$device.MarkAsSynchronized()
ehinkle29
Posts: 6
Joined: Thu Mar 13, 2014 8:34 pm

Tue May 27, 2025 9:17 pm

No the lun was working fine for a day or so and I had 4 VM's running on it before having this issue.
yaroslav (staff)
Staff
Posts: 3598
Joined: Mon Nov 18, 2019 11:11 am

Tue May 27, 2025 10:12 pm

Please tell me more about the system.
1. Underlying storage
2. networking configuration.
3. Is it a CVM or Windows-based service?
4. Did you perform anything specific before the synchronization drops?

The replication partners return to sync automatically unless something significant has happened.
ehinkle29
Posts: 6
Joined: Thu Mar 13, 2014 8:34 pm

Wed May 28, 2025 12:37 pm

It is a cvm running on proxmox. I have 3 networks configured 1 for mgt, 1 for data and 1 for replication. The disk are disk assigned to the CVM on the appliance. I think the disk on disk 1 was having issues because I accidentially added a file to the underlining directory that the vm disk was on although I was only using like 120 gig out of an 800gig disk and accidently placed a vm disk on the underlying disk instead of the cvm disk. So I removed the disk and think I may need to resync from the other node.
yaroslav (staff)
Staff
Posts: 3598
Joined: Mon Nov 18, 2019 11:11 am

Wed May 28, 2025 1:29 pm

Hi,

Thanks for your update! Do you need to re-add the replication partner? If so, you will need to run RemoveHAPartner to remove the replication parnter and add it over again with AddHaPartner.
ehinkle29
Posts: 6
Joined: Thu Mar 13, 2014 8:34 pm

Wed May 28, 2025 6:29 pm

I don't think I can remove the replication partner node because I have two luns. One lun, lun0 is working fine the lun1 is the one I am having the issue with.
ehinkle29
Posts: 6
Joined: Thu Mar 13, 2014 8:34 pm

Wed May 28, 2025 6:38 pm

I ended up shutting down the vm's that was on it and looks like it eventually fixed it self was just getting ready to run the sync script and it is healthy now. How can I monitor replication, and the system. Currently not sure if it is a limitation on the free version, but I can't see IO stats in the dashboard.
yaroslav (staff)
Staff
Posts: 3598
Joined: Mon Nov 18, 2019 11:11 am

Wed May 28, 2025 9:02 pm

It is great to read that the issue got fixed by a restart.
You can see sync progress bar under LUNs.
The IO stats are not a limitation for Free version but the limitation from CVM in Proxmox. I had logged internal case for it and hope it to be fixed at some point.
Post Reply