[Linux-cluster] DLM behavior after lockspace recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We're having a problem with the lack of lock state indication
for user programs following the failure of a cluster member.
During lock recovery on VMS locks with value blocks that might be
in question are marked as 'value block not valid' in the resource
database.

Additionally, the value of the lock value block is not zeroed.
Presumably if one of the surviving nodes was the master for the
lock, its concept of the lock value block is used during recovery.
If none of the surviving nodes was mastering the lock I'm not sure which
value is used for the lock value block. Perhaps its arbitrary or perhaps
there is some bookkeeping to track which value is the most recent.
There is a field named RSB$L_VALSEQNUM associated with each resource
on VMS. I'm not sure what this is used for but it suggests that there
is some bookkeeping related to lock value block version #'s.

When the value block not valid flag is set any subsequent lock requests
which read the lock value block complete with a value-block-not-valid
status. This is a success state. This status continues to be returned
until the value block is written or the lock is forgotten because it has
been released by all cluster members.

I looked into modifying the DLM recovery code to support this behavior
and it doesn't look all that difficult. My concern though is that this
would probably break GFS as it introduces a new return status. I'm not
sure that implementing this as a flag returned in the lock value
block is really a good idea as it would mean that interested
applications would have to perform an extra memory reference for a
fairly uncommon situation. Should this be implemented as a property
of the lockspace which is defined when the lockspace is created?


A description of lock block invalidation on Tru64 from:
http://h30097.www3.hp.com/docs/cluster_doc/cluster_51A/HTML/MAN/MAN4/0004____.HTM

> The DLM will mark a resource's lock value block as invalid if:
> 
>     ·  A process holding a PW or EX mode lock on the resource calls the
>        dlm_unlock() function to dequeue this lock, and specifies the flag
>        DLM_INVVALBLK in the flags parameter.
> 
>     ·  Any process holding a PW or EX mode lock on a resource terminates
>        abnormally.
> 
>     ·  A node joins or leaves the cluster and the resource has only NL or CR
>        mode locks. The reason is because a process might have been holding a
>        PW or EX mode lock on the resource when the cluster membership
>        changed.
> 
>   A process holding an NL or CR mode lock on a resource must be able to
>   handle arbitrary invalidates of the lock value block.
> 
>   If the caller has requested a lock value block and the lock value block
>   that is read is marked as valid, the DLM returns either DLM_SUCCESS or
>   DLM_SYNC completion status (as long as the operation itself is successful).
>   If the lock value block that is read is marked as invalid, the DLM returns
>   a completion status of DLM_SUCCVALNOTVALID or DLM_SYNCVALNOTVALID.
> 
>   The DLM_SUCCVALNOTVALID and DLM_SYNCVALNOTVALID condition values are
>   warning messages, not error messages. The request has completed
>   successfully; the function grants the requested lock and returns this
>   warning on all subsequent calls to dlm_lock(), dlm_locktp(), dlm_quelock(),
>   dlm_quelocktp(), dlm_cvt(), and dlm_quecvt()        until either a new lock value
>   block is written to the lock database (causing the lock value block to be
>   marked as valid) or the resource is deleted.        Resource deletion occurs when
>   the last lock on the resource is deleted with a dlm_unlock() function call.
> 
>   The DLM returns the current copy of the resource's lock value block.        The
>   DLM_SUCCVALNOTVALID and DLM_SYNCVALNOTVALID condition values indicate that
>   this may not be the most recent value that has existed. The initial value
>   of a lock value block prior to its first update is always zero (0).




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux