On Fri, Sep 4, 2015 at 2:35 PM, Prashanth Pai <ppai@xxxxxxxxxx> wrote:
We *may* have hit this once earlier when we had multiple instances of object-expirer daemon deleting huge number of objects (files).
----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> To: gluster-devel@xxxxxxxxxxx
> Cc: gluster-users@xxxxxxxxxxx
> Sent: Friday, September 4, 2015 12:43:09 PM
> Subject: [Gluster-devel] [posix-compliance] unlink and access to file through open fd
>
> All,
>
> Posix allows access to file through open fds even if name associated with
> file is deleted. While this works for glusterfs for most of the cases, there
> are some corner cases where we fail.
>
> 1. Reboot of brick:
> ===================
>
> With the reboot of brick, fd is lost. unlink would've deleted both gfid and
> path links to file and we would loose the file. As a solution, perhaps we
> should create an hardlink to the file (say in .glusterfs) which gets deleted
> only when last fd is closed?
>
> 2. Graph switch:
> =================
>
> The issue is captured in bz 1259995 [1]. Pasting the content from bz
> verbatim:
> Consider following sequence of operations:
> 1. fd = open ("/mnt/glusterfs/file");
> 2. unlink ("/mnt/glusterfs/file");
> 3. Do a graph-switch, lets say by adding a new brick to volume.
> 4. migration of fd to new graph fails. This is because as part of migration
> we do a lookup and open. But, lookup fails as file is already deleted and
> hence migration fails and fd is marked bad.
>
> In fact this test case is already present in our regression tests, though the
> test checks whether the fd is just marked as bad. But the expectation of
> filing this bug is that migration should succeed. This is possible since
> there is an fd opened on brick through old-graph and hence can be duped
> using dup syscall.
>
> Of course the solution outlined here doesn't cover the case where file is not
> present on brick at all. For eg., a new brick was added to replica set and
> that new brick doesn't contain the file. Now, since the file is deleted, how
> do replica heals that file to another brick etc.
>
> But atleast this can be solved for those cases where file was present on a
> brick and fd was already opened.
>
> 3. Open-behind and unlink from a different client:
> ==================================================
>
> While open-behind handles unlink from the same client (through which open was
> performed), if unlink and open are done from two different clients, file is
> lost. I cannot think of any good solution for this.
This was only observed at scale - deleting a million objects. Our user-space application flow was roughly as follows:
fd = open(...)
s = stat(fd)
fgetxattr(fd, ....)
In our case, open() and stat() succeeded but fgetxattr() failed with ENOENT (many times with ESTALE too) probably because some other client
has done an unlink() on the file name already. Is this behavior normal ?
Its possible (may not be normal, since we are being non-posix complaint here :)).
1. Open might've been serviced by open-behind (faking it).
2. fstat might've been served from md-cache (If it had hit open-behind, it would've done an open before fstat is completed).
1. Open might've been serviced by open-behind (faking it).
2. fstat might've been served from md-cache (If it had hit open-behind, it would've done an open before fstat is completed).
3. fgetxattr, if it hits open-behind and file is already deleted from some other client, fgetxattr will fail with ESTALE (not ENOENT, since open is done on gfid and if gfid cannot be looked-up, server-resolver sends out ESTALE).
@Thiago: Remember this one?
http://paste.openstack.org/show/357414/
https://gist.github.com/thiagodasilva/491e405a3385f0e85cc9
>
> I wanted to know whether these problems are real enough to channel our
> efforts to fix these issues. Comments are welcome in terms of solutions or
> other possible scenarios which can lead to this issue.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995
>
> regards,
> Raghavendra.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
--
Raghavendra G
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users