Re: Non-blocking lock for renames

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Thu, 4 Feb 2016 00:58:29 -0500 (EST)

----- Original Message -----
> From: "Vijay Bellur" <vbellur@xxxxxxxxxx>
> To: "Shyamsundar Ranganathan" <srangana@xxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> Sent: Thursday, February 4, 2016 9:55:04 AM
> Subject: Non-blocking lock for renames
> 
> DHT developers,
> 
> We introduced a non-blocking lock prior to a rename operation in dht and
> fail the rename if the lock acquisition is not successful with 3.6. I
> ran into an user in IRC yesterday who is affected by this behavior change:
> 
> "We're seeing a behavior in Gluster 3.7.x that we did not see in 3.4.x
> and we're not sure how to fix it. When multiple processes are attempting
> to rename a file to the same destination at once, we're now seeing
> "Device or resource busy" and "Stale file handle" errors. Here's the
> command to replicate it: cd /mnt/glustermount; while true; do
> FILE=$RANDOM; touch $FILE; mv $FILE file-fv; done. The above command
> would be ran on two or three servers within the same gluster cluster. In
> the output, one would always be sucessfull in the rename, while the 2
> other ones would fail with the above error."
> 
> The use case for concurrent renames was described as:
> 
> "we generate files and push them to the gluster cluster. Some are
> generated multiple times and end up being pushed to the cluster at the
> same time by different data generators; resulting in the 'rename
> collision'. We use also the cluster.extra-hash-regex to make sure the
> data is written in place. And this does the rename."
> 
> Is a non-blocking lock essential? Can we not use a blocking lock instead
> of a non-blocking lock or fallback to a blocking lock if the original
> non-blocking lock acquisition fails?

This lock synchronizes:
1. rename from application with file migration from rebalance process [1].
2. multiple renames from application on same file.

I think lock is still required for 1. However, since migration can potentially take large time, we chose a non-blocking lock to make sure application is not blocked for longer period.

The case 2 is what causing the issue mentioned in this thread. We did see some files being removed with parallel renames on the same file. But, by the time we had identified that its a bug in 'mv' (mv issues an unlink on src if src and dst happens to be hardlinks [2]. But test for hardlink check and unlink are not atomic. Dht breaks rename into a series of links and unlinks), we had introduced synchronizing b/w renames. So, we have two options:

1. Use different domains for use cases 1 and 2 above. With different domains, use-case 2 above can be changed to use blocking locks. It might not be advisable to use blocking locks for use-case 1.
2. Since we identified the issue is with mv (I couldn't find another bug we filed on mv, but [2] is close to it), probably we don't need locking in 2 at all.

Suggestions?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=969298#c8
[2] https://bugzilla.redhat.com/show_bug.cgi?id=438076

regards,
Raghavendra
> 
> Thanks,
> Vijay
> 
> 
> 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel