After Node fail and resync: iscsi devices are different

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
atlhivemind
Posts: 3
Joined: Wed Dec 03, 2014 2:31 am

Mon Jan 19, 2015 2:20 pm

A RAID controller reconfiguration gone wrong required one of my two HA nodes to be rebuilt almost from bare metal (the OS and Starwind install survived, but three data-storing RAID arrays were fried, oops).
Going from NODE2 (the surviving good node) to Node 1, I re-created Node 2's HA replication partners using the replication manager and did a full sync of data. Everything seemed fine.

Fast forward to yesterday when I go to get vmware reconnected to NODE1. Rescan the iscsi bus... nothing.
* All paths to node1 are connected but no I/O.
* NODE1's targets show sessions for all of my VM hosts.

I isolate one host and ditch the multipathing, connecting it only to Node 1.
S scan shows all 3 iscsi disks, reading out as "STARWIND ISCSI Disk (eui.yyyyyyyyy)" They show up as BLANK disks.
A working VM host shows the same HA device as "ROCKET ISCSI DISK (eui.xxxxxxxx)".

Nodes 1 and 2 were originally v6 installs, upgraded to v8 recently. On the view of a HA disk Image in NODE2, there is a "HA Image" section with a serial-id and a virtual disk path pointing to the thick-provisioned disk image file. On NODE1, there is a "HA Image" section with the same serial-id but the virtual disk points to "imagefile1". There is a section below called "STORAGE" defining "imagefile1" with the path to the virtual disk.

Two of the three disks on NODE1 were created using the replication node wizard. The third was done manually using "add device (advanced)" and added manually. They all have the same results.

I am running off a single Storage node and, with VMWare thinking one half of a supposedly-synchronized cluster is blank, this is a very bad thing.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Jan 21, 2015 5:40 pm

Hi! Sory to hear about your trouble.
Such issues occur when adding/changing an HA partner for a v8 device originally created on v6.
Such devices have different eui numbers, thus making ESX unable to recognize the device.
After HBA rescan ESX will prompt to re-format the drive. Please perform the following steps to fix this problem:
1. Open HA headers of both original and partner devices using WordPad
2. Headers originally created on v6 will have the following tags present in their .swdsk files:
<serial_id>e506e991-76bd-405c-a308-866e98</serial_id>
<vendor id="ROCKET "/>
<product id="IMAGEFILE " revision="0001"/>
<eui_64>EBA23949A64D3F74</eui_64>
3. Headers on HA partner will have the following tags present in their .swdsk files:
<serial_id> e506e991-76bd-405c-a308-866e98</serial_id>
<vendor id="STARWIND"/>
<product id="STARWIND " revision="0001"/>
<eui_64>DD3284A8BFC1B075</eui_64>
4. Modify eui_64 number , vendor id and product id and serial id (if they are not already identical) in new headers to match those with ROCKET
5. Restart StarWind service on the partner.
6. Re-scan HBA’s on both ESX(i) hosts.
7. If this does not help, re-adding ip to dynamic discovery should fix it.
8. Check “manage paths” tab of datastore properties in vCenter.
Post Reply