Re: Locking behavior vs rmdir/unlink of a directory/file

Vijay Bellur <vbellur@xxxxxxxxxx> · Tue, 25 Aug 2015 14:57:11 +0530

On Tuesday 25 August 2015 12:33 PM, Raghavendra Gowdappa wrote:

----- Original Message -----
From: "Vijay Bellur" <vbellur@xxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Cc: "Sakshi Bansal" <sabansal@xxxxxxxxxx>
Sent: Monday, August 24, 2015 3:52:09 PM
Subject: Re:  Locking behavior vs rmdir/unlink of a directory/file

On Thursday 20 August 2015 10:24 AM, Raghavendra Gowdappa wrote:
Hi all,

Most of the code currently treats inode table (and dentry structure
associated with that) as the correct representative of underlying backend
file-system. While this is correct for most of the cases, the
representation might be out of sync for small time-windows (like file
deleted on disk, but dentry and inode is not removed in our inode table
etc). While working on locking directories in dht for better consistency
we ran into one such issue. The issue is basically to make rmdir and
directory creation during dht-selfheal mutually exclusive. The idea is to
have a blocking inodelk on inode before proceeding with rmdir or directory
self-heal. However, consider following scenario:

1. (dht_)rmdir acquires a lock.
2. lookup-selfheal tries to acquire a lock, but is blocked on lock acquired
by rmdir.
3. rmdir deletes directory and unlocks the lock. Its possible for inode to
remain in inode table and searchable through gfid till there is a positive
reference count on it. In this case lock-request (by lookup) and
granted-lock (to rmdir) makes the inode to remain in inode table even
after rmdir.
4. lock request issued by lookup is granted.

Note that at step 4, its still possible rmdir might be in progress from dht
perspective (it just completed on one node). However, this is precisely
the situation we wanted to avoid i.e., we wanted to block and fail
dht-selfheal instead of allowing it to proceed.

In this scenario at step 4, the directory is removed on backend
file-system, but its representation is still present in inode table. We
tried to solve this by doing a lookup on gfid before granting a lock [1].
However, because of [1]

1. we no longer treat inode table as source of truth as opposed to other
non-lookup code
2. performance hit in terms of a lookup on backend-filesystem for _every_
granted lock. This may not be as big considering that there is no network
call involved.

Can we not mark the in memory inode as having been unlinked in
posix_rmdir() and use this information to determine whether a lock
request can be processed?

Yes. Nithya suggested the same. But this seemed like a hacky fix. Reason is:

1. Currently we don't really differentiate in inode management based on inode type. The code (dentry management, inode management) is agnostic to type. With this fix we are bringing such explicit differentiation. Note that only for directories we can have such a flag indicating that inode is removed from backend. Since, files have hardlinks it would be difficult (if not impossible, as unlink_cbk doesn't carry iatt to figure out whether current unlink is on last link). This makes the solution only applicable for directories. For lock requests on files we still need to lookup on the backend (for our use-case this is fine, since we are not locking on files). Not a show-stopper, but something in terms of aesthetics.

On the contrary, the idea of doing a lookup from locks translator looks 
like a hack to me. Locks is designed to work on an in memory data 
structure for synchronization and it should not be looking beyond that 
to make decisions. With the current inode management code, we have cases 
in inode_link () where we look at ia_type of an inode for making decisions.

I had an offline conversation with Raghavendra and the core problem 
seems to be stemming due to an expectation in dht that a path exists if 
an INODELK is granted. As our inode state may not exactly match the on 
disk state, we will evaluate the possibility of conditionally altering 
INODELK semantics to guarantee path existence. This way all granted 
INODELK requests would not need to perform additional stat calls. If we 
go down this route, we should document this behavior so that 
expectations from INODELK is clear.

The whole thing about locks during/after file/directory is removed seems to be not well defined as of now IMHO.
      a. We can acquire lock because inode exists in inode-table.
      b. inode-exists in inode-table because there are some locks on inode holding reference.

    Of course, this situation will be fixed with maintaining a flag indicating file/directory removal (or by doing a lookup). But some clarifications needed - If we are going to have some information indicating file/directory is removed, what should be the behaviour of future lock/unlock calls? should we fail them? For lock calls, we can fail them. But for unlock there are two choices:
      a. Let the consumer send an unlock even after remove
      b. Or clear out the locks during unlink/rmdir.

     I prefer approach a. here. Comments?

Yes, approach a. would be desirable. The onus should usually be on the 
consumer to clean up resources acquired.

-Vijay
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel