Re: Non-blocking lock for renames

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Thu, 4 Feb 2016 06:10:14 -0500 (EST)

----- Original Message -----
> From: "Prashanth Pai" <ppai@xxxxxxxxxx>
> To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> Cc: "Vijay Bellur" <vbellur@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Thiago da Silva"
> <thiago@xxxxxxxxxx>
> Sent: Thursday, February 4, 2016 3:53:22 PM
> Subject: Re:  Non-blocking lock for renames
> 
> 
> 
> ----- Original Message -----
> > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> > To: "Vijay Bellur" <vbellur@xxxxxxxxxx>
> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > Sent: Thursday, February 4, 2016 6:58:29 AM
> > Subject: Re:  Non-blocking lock for renames
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Vijay Bellur" <vbellur@xxxxxxxxxx>
> > > To: "Shyamsundar Ranganathan" <srangana@xxxxxxxxxx>, "Raghavendra
> > > Gowdappa"
> > > <rgowdapp@xxxxxxxxxx>
> > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > Sent: Thursday, February 4, 2016 9:55:04 AM
> > > Subject: Non-blocking lock for renames
> > > 
> > > DHT developers,
> > > 
> > > We introduced a non-blocking lock prior to a rename operation in dht and
> > > fail the rename if the lock acquisition is not successful with 3.6. I
> > > ran into an user in IRC yesterday who is affected by this behavior
> > > change:
> > > 
> > > "We're seeing a behavior in Gluster 3.7.x that we did not see in 3.4.x
> > > and we're not sure how to fix it. When multiple processes are attempting
> > > to rename a file to the same destination at once, we're now seeing
> > > "Device or resource busy" and "Stale file handle" errors. Here's the
> > > command to replicate it: cd /mnt/glustermount; while true; do
> > > FILE=$RANDOM; touch $FILE; mv $FILE file-fv; done. The above command
> > > would be ran on two or three servers within the same gluster cluster. In
> > > the output, one would always be sucessfull in the rename, while the 2
> > > other ones would fail with the above error."
> > > 
> > > The use case for concurrent renames was described as:
> > > 
> > > "we generate files and push them to the gluster cluster. Some are
> > > generated multiple times and end up being pushed to the cluster at the
> > > same time by different data generators; resulting in the 'rename
> > > collision'. We use also the cluster.extra-hash-regex to make sure the
> > > data is written in place. And this does the rename."
> > > 
> > > Is a non-blocking lock essential? Can we not use a blocking lock instead
> > > of a non-blocking lock or fallback to a blocking lock if the original
> > > non-blocking lock acquisition fails?
> > 
> > This lock synchronizes:
> > 1. rename from application with file migration from rebalance process [1].
> > 2. multiple renames from application on same file.
> 
> Hi,
> 
> We've seen this behavior very recently when we had multiple instances of
> object servers on different nodes, each with it's own FUSE mount. During our
> tests, we often see many object PUTs fail because of rename() throwing EBUSY
> or ESTALE (which we don't catch as of today). I'm certain that there was no
> rebalance happening during that time and we don't use "mv" command for
> rename. The object server does a series of mkdirs(), followed by unique temp
> file creation and finally rename(). In our particular test, the final file
> path was also unique. So it's not multiple renames on the "same file". I'll
> try to reproduce this later and provide logs.

That's strange. At least in dht, lock is acquired only on the file being renamed.

> 
> > 
> > I think lock is still required for 1. However, since migration can
> > potentially take large time, we chose a non-blocking lock to make sure
> > application is not blocked for longer period.
> > 
> > The case 2 is what causing the issue mentioned in this thread. We did see
> > some files being removed with parallel renames on the same file. But, by
> > the
> > time we had identified that its a bug in 'mv' (mv issues an unlink on src
> > if
> > src and dst happens to be hardlinks [2]. But test for hardlink check and
> > unlink are not atomic. Dht breaks rename into a series of links and
> > unlinks), we had introduced synchronizing b/w renames. So, we have two
> > options:
> > 
> > 1. Use different domains for use cases 1 and 2 above. With different
> > domains,
> > use-case 2 above can be changed to use blocking locks. It might not be
> > advisable to use blocking locks for use-case 1.
> > 2. Since we identified the issue is with mv (I couldn't find another bug we
> > filed on mv, but [2] is close to it), probably we don't need locking in 2
> > at
> > all.
> > 
> > Suggestions?
> > 
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=969298#c8
> > [2] https://bugzilla.redhat.com/show_bug.cgi?id=438076
> > 
> > regards,
> > Raghavendra
> > > 
> > > Thanks,
> > > Vijay
> > > 
> > > 
> > > 
> > > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel