iscsi reconnecting repeatedly
Posted: Tue Aug 16, 2011 1:40 pm
Hi...
Max is looking into this problem with me but we seem to be at a standstill at the moment, so I wondered if anyone else might be able to comment or make suggestions.
I have a two-node hyper-v cluster on win2k8R2-SP1, Starwind also on the same two boxes.
I tried to upgrade server A from 5.6.1690 to 5.7.1721 but it hung so we had to kill the install. Max then directed me to install 5.7.1727 which fixes issues with two-node hyper-v clusters. It installed ok on both server A and B.
The problem is that the iscsi connections are repeatedly reconnecting on both servers.
Windows Event log shows event 20: "Connection to the target was lost. The initiator will attempt to retry the connection." followed by event 34: "A connection to the target was lost, but Initiator successfully reconnected to the target. Dump data contains the target name." And it is doing this for all 3 of my targets. The Windows Event log is filled with literally thousands of these.
Also getting a smaller number of other related errors (some of which I know are just consequence of the two above):
- Connection to the target was lost. The initiator will attempt to retry the connection.
- Target failed to respond in time to a Task Management request.
- Target sent an invalid iSCSI PDU. Dump data contains the entire iSCSI header.
- Initiator sent a task management command to reset the target. The target name is given in the dump data.
- The initiator could not send an iSCSI PDU. Error status is given in the dump data.
- Target failed to respond in time for a login request.
And one other detail which I am not sure of the impact is that on server A, the MPIO panel only shows one device whereas server B has two. The missing one is "Vendor 8Product 16". It was there when 5.6 was running. The strange part is that Max checked the connections and it appears that it is actually using MPIO for the partner targets.
I sent Starwind and Windows logs to Max for R&D to look at. No word back yet.
This is really holding up my project work and my VMs are at risk like this. Any advice would be appreciated. I thought about going back to 5.6 - is that even possible? My gut feeling is that something is corrupted somewhere (maybe even in Windows itself?) and I'm not sure if going back to 5.6 would solve it.
Graham
Max is looking into this problem with me but we seem to be at a standstill at the moment, so I wondered if anyone else might be able to comment or make suggestions.
I have a two-node hyper-v cluster on win2k8R2-SP1, Starwind also on the same two boxes.
I tried to upgrade server A from 5.6.1690 to 5.7.1721 but it hung so we had to kill the install. Max then directed me to install 5.7.1727 which fixes issues with two-node hyper-v clusters. It installed ok on both server A and B.
The problem is that the iscsi connections are repeatedly reconnecting on both servers.
Windows Event log shows event 20: "Connection to the target was lost. The initiator will attempt to retry the connection." followed by event 34: "A connection to the target was lost, but Initiator successfully reconnected to the target. Dump data contains the target name." And it is doing this for all 3 of my targets. The Windows Event log is filled with literally thousands of these.
Also getting a smaller number of other related errors (some of which I know are just consequence of the two above):
- Connection to the target was lost. The initiator will attempt to retry the connection.
- Target failed to respond in time to a Task Management request.
- Target sent an invalid iSCSI PDU. Dump data contains the entire iSCSI header.
- Initiator sent a task management command to reset the target. The target name is given in the dump data.
- The initiator could not send an iSCSI PDU. Error status is given in the dump data.
- Target failed to respond in time for a login request.
And one other detail which I am not sure of the impact is that on server A, the MPIO panel only shows one device whereas server B has two. The missing one is "Vendor 8Product 16". It was there when 5.6 was running. The strange part is that Max checked the connections and it appears that it is actually using MPIO for the partner targets.
I sent Starwind and Windows logs to Max for R&D to look at. No word back yet.
This is really holding up my project work and my VMs are at risk like this. Any advice would be appreciated. I thought about going back to 5.6 - is that even possible? My gut feeling is that something is corrupted somewhere (maybe even in Windows itself?) and I'm not sure if going back to 5.6 would solve it.
Graham