I am getting an error that one of my luns is not able to sync with the replication partner. If I look via the web gui i get the following error for both nodes. I am running a two node cluster using virtual machines.
The replication partner is not synchronised.
If I open the SW management console, it shows on node 2, Device is not synchronized. On Node 1 it shows Partner 'iqn.2008....' is not synchronized. Below are some errors from the two nodes. I am thinking something is going on with node1. I shutdown node1 thinking it should have worked with just node 2 but devices using lun 1 still had issues. I suspect something happened with node1 disk, although I have ran fschk on the system. How would I be able to take lun1 offline for say node 1 and force a replication from node2 if it is a problem. Just thinking if I had a disk failure on a node it should run fine on the other node till I shutdown and replaced the drive and resynch.
Node 1
5/27 12:17:08.214853 59 Srv: TelnetListener::listenConnections: Accepted control connection from 127.0.0.1:41290.
5/27 12:17:08.215073 e3 Srv: ControlConnection::processConnection: start processing 127.0.0.1:41290
5/27 12:17:08.215679 e3 FileBrowser: CFileBrowser::command: Return param: 'DiskSize' = '30837178368'
5/27 12:17:08.215859 e3 FileBrowser: CFileBrowser::command: Return param: 'FreeSpace' = '19769409536'
5/27 12:17:08.216242 59 Srv: TelnetListener::listenConnections: Accepted control connection from 127.0.0.1:41294.
5/27 12:17:08.216468 e4 Srv: ControlConnection::processConnection: start processing 127.0.0.1:41294
5/27 12:17:08.218559 e4 Srv: ControlConnection::processConnection: finish 127.0.0.1:41294
5/27 12:17:08.218570 e4 Srv: ControlConnection::stop: 127.0.0.1:41294
5/27 12:17:08.219110 e3 FileBrowser: CFileBrowser::command: Return param: 'DiskSize' = 'Failed: Specified file can not be opened'
5/27 12:17:08.219355 e3 FileBrowser: CFileBrowser::command: Return param: 'DiskSize' = 'Failed: Specified file can not be opened'
5/27 12:17:08.219474 e3 Srv: ControlConnection::processConnection: finish 127.0.0.1:41290
5/27 12:17:08.219484 e3 Srv: ControlConnection::stop: 127.0.0.1:41290
node 2 I am seeing this.
5/27 12:20:04.794893 d4 Tgt: *** iScsiTarget::openSession: iqn.2008-08.com.starwindsoftware:10.35.15.15-lun1: can't register session. The device 'HAImage2' is not ready.
5/27 12:20:04.794897 d4 T[f2a7,1]: ***iScsiTask::startLoginPhase: *ERROR* Login request: device open failed.
5/27 12:20:04.795084 1c4 C[f2a7], IN_LOGIN: iScsiConnection::doTransition: Event - LOGIN_REJECT.
5/27 12:20:04.795422 d4 C[f2a7], IN_LOGIN: iScsiConnection::recvData: Recv - peer shutdown
5/27 12:20:04.795453 d4 C[f2a7], IN_LOGIN: iScsiConnection::receive: recvData returned error 10058 (0x274a)!
5/27 12:20:04.795700 1c4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.795808 d4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.795970 d4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.796051 d4 Srv: *** SwSocket::Shutdown: shutdown() failed with error 10058!
5/27 12:20:04.796186 1e6 S[f2a7]: iScsiSession::~iScsiSession: ~Session
How can I get this lun1 back working?
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software