Re: POSIX locks and disconnections between clients and bricks

Soumya Koduri <skoduri@xxxxxxxxxx> · Wed, 27 Mar 2019 15:23:35 +0530

On 3/27/19 12:55 PM, Xavi Hernandez wrote:
Hi Raghavendra,

On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
<rgowdapp@xxxxxxxxxx <mailto:rgowdapp@xxxxxxxxxx>> wrote:

    All,

    Glusterfs cleans up POSIX locks held on an fd when the client/mount
    through which those locks are held disconnects from bricks/server.
    This helps Glusterfs to not run into a stale lock problem later (For
    eg., if application unlocks while the connection was still down).
    However, this means the lock is no longer exclusive as other
    applications/clients can acquire the same lock. To communicate that
    locks are no longer valid, we are planning to mark the fd (which has
    POSIX locks) bad on a disconnect so that any future operations on
    that fd will fail, forcing the application to re-open the fd and
    re-acquire locks it needs [1].

Wouldn't it be better to retake the locks when the brick is reconnected 
if the lock is still in use ?

BTW, the referenced bug is not public. Should we open another bug to 
track this ?

    Note that with AFR/replicate in picture we can prevent errors to
    application as long as Quorum number of children "never ever" lost
    connection with bricks after locks have been acquired. I am using
    the term "never ever" as locks are not healed back after
    re-connection and hence first disconnect would've marked the fd bad
    and the fd remains so even after re-connection happens. So, its not
    just Quorum number of children "currently online", but Quorum number
    of children "never having disconnected with bricks after locks are
    acquired".

I think this requisite is not feasible. In a distributed file system, 
sooner or later all bricks will be disconnected. It could be because of 
failures or because an upgrade is done, but it will happen.

The difference here is how long are fd's kept open. If applications open 
and close files frequently enough (i.e. the fd is not kept open more 
time than it takes to have more than Quorum bricks disconnected) then 
there's no problem. The problem can only appear on applications that 
open files for a long time and also use posix locks. In this case, the 
only good solution I see is to retake the locks on brick reconnection.

    However, this use case is not affected if the application don't
    acquire any POSIX locks. So, I am interested in knowing
    * whether your use cases use POSIX locks?
    * Is it feasible for your application to re-open fds and re-acquire
    locks on seeing EBADFD errors?

I think that many applications are not prepared to handle that.

+1 to all the points mentioned by Xavi. This has been day-1 issue for 
all the applications using locks (like NFS-Ganesha and Samba). Not many 
applications re-open and re-acquire the locks. On receiving EBADFD, that 
error is most likely propagated to application clients.

Agree with Xavi that its better to heal/re-acquire the locks on brick 
reconnects before it accepts any fresh requests. I also suggest to have 
this healing mechanism generic enough (if possible) to heal any 
server-side state (like upcall, leases etc).

Thanks,
Soumya

Xavi

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7

    regards,
    Raghavendra

    _______________________________________________
    Gluster-users mailing list
    Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
    https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users