The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software
Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)
danisoto wrote:Hi,
After testing the last release of iSCSI target with HA support (great! 128GB for free is good for our tests), I found that the current implementation it is not enough for Fault Tolerant services.
We do several tests, but this one is very descriptive (Build 20121115):
* Two nodes (A and B) with an HA iSCSI target full synchronized. A: Master; B: Slave.
* "Controlled shutdown" of node A.
* Node B runs and the iSCSI target still works.
* "Controlled shutdown" of node B.
* Obviously, iSCSI target disappears.
* After starting the node B, the iSCSI target does not start and the volume remains forever as out of sync.
Please, this behaviour in completely unacceptable! With CONTROLLED shutdowns it's impossible to have any data discrepancy. The systems must allow to auto-enable the target volume when no data is lost. If both nodes are powered off (with a shutdown) the system NEVER auto synchronize!
I suggest to re-read my past thread about and automatic timeout for auto-synch in case of power failure:
http://www.starwindsoftware.com/forums/ ... t2670.html
I prefer a system that does not need ANY kind of user intervention in case of errors. We can tolerate some errors, as we use a fault tolerant system on top of the shared volume. But the system needs to restart the target in ANY situation.
Please, can you improve this?
Thank you!
Hi Lohelle,lohelle wrote:danisoto:
If you shut down node A and run on node B for a few minutes.. then you shut down node B also.
Scenario 1: You turn on only node A. You really do not want node A to start serving clients automaticly, as this is not the most recent data.
Scenario 2: You turn on only node B. This "should" be the most recent data, but the software does not know if you forced node A to be "syncronized" and then it crashed again (or was shut down)
Scenario 3: Both nodes start, and when they can exchange "info" the sync process from the correct partner starts.
danisoto wrote:Hi Anton,
Shocking news for us! In this case we need to search for another software.
Why not put this functions as optional?
Regards!
danisoto wrote:Hi Lohelle,lohelle wrote:danisoto:
If you shut down node A and run on node B for a few minutes.. then you shut down node B also.
Scenario 1: You turn on only node A. You really do not want node A to start serving clients automaticly, as this is not the most recent data.
Scenario 2: You turn on only node B. This "should" be the most recent data, but the software does not know if you forced node A to be "syncronized" and then it crashed again (or was shut down)
Scenario 3: Both nodes start, and when they can exchange "info" the sync process from the correct partner starts.
Our concern is about both nodes running AND both marked as "not synchronized".
In this case we need this deterministic behaviour: "Always mark PRIMARY as SYNC".
We can't accept any user intervention to to carry out a STATIC algorithm, like "both nodes up, both out of sync, then set primary as sync".
This behaviour as option:
1) It's very easy to implement at this point.
2) Don't disturb any client.
3) It's an easy solution to enable a simple FT Service.
I know: there no guarantees of consistency, but our filesystem on top of the iSCSI volume can tolerate errors!
Please, reconsider your thinking.
After SEVERAL tests, results are inconsistent:danisoto wrote:Hi Lohelle,lohelle wrote:danisoto:
If you shut down node A and run on node B for a few minutes.. then you shut down node B also.
Scenario 3: Both nodes start, and when they can exchange "info" the sync process from the correct partner starts.
Our concern is about both nodes running AND both marked as "not synchronized".
In this case we need this deterministic behaviour: "Always mark PRIMARY as SYNC".
Hi Anton,anton (staff) wrote:Instead of looking @ other software you need to re-think what you're doing. If you're putting up all nodes being down there's no way to know
whose data is the mos recent one. You cannot trust to time recorded in log as if cluster went down - time may go out of sync as well. You need
to power up the cluster, manually mount volumes in read-only mode and check what's there to figure out whose data is the most recent. Human
operator can do it, machine - cannot.