Re: handling open fds and graph switches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/06/2013 04:22 AM, Raghavendra Bhat wrote:
> 
> Hi,
> 
> As of now, there is a problem when following set of operations are
> performed on a file.
> 
> open () => unlink () => do a graph change (not reconfigure) => fop on
> the opened fd (may be write)
> 
> In the above set of operations, the fop performed on the fd after the
> graph switch fails with EBADFD (which should not happen). Its because
> when the file is unlinked (assume there are no other hardlinks for the
> file), the gfid handle present in the .glusterfs directory of the brick
> is removed. Now when graph change happens, all fds have to be migrated
> to the new graph. Before that a nameless lookup will be sent on the gfid
> (to build the new inode in the new graph). The nameless lookup happens
> on the gfid handle. But since the gfid handle is removed upon receiving
> the unlink, nameless lookup fails, thus failing the fd migration to the
> new graph and the fops on the fd are also failed.
> 
> A patch has been sent to handle
> this(http://review.gluster.org/#/c/5428/), where the gfid handle is
> removed when the last reference to the file is removed (i.e upon getting
> the unlink, it also checks whether there are any open fds on the inode.
> If so, then the gfid handle is not removed. Its removed when release on
> that fd is received). But that approach might lead to gfid handle leaks
> (what if glusterfsd crashes upon unlinking the last entry? the gfid
> handle might not have been removed if there are open fds. And now if
> glusterfsd crashes, then the gfid handle for that file is leaked).
> 
> Another approach might be to make posix_lookup do a stat on one of the
> fds present on the inode when it has to build a INODE HANDLE (which
> happens as part of nameless lookup). The nameless lookup suceeds and the
> new inode is looked up in the new graph for the client. But after that,
> there are 2 more issues.
> 
> 1) After successful completion of the nameless lookup, the file has to
> be opened in the new graph. So a syncop_open is sent on the new graph
> for the gfid. In posix_open, posix xlator again tries to open the file
> using the gfid handle. But since the gfid handle is removed, open fails
> and the file is not opened (thus fd migration fails again.) We can
> search the list of fds for the inode, find the right fd that the fuse
> client is trying to migrate and return that fd. But searching the right
> fd is a hard task. (What if a fuse client has opened 2 fds with same
> flags?)
> 
> 2) Another problem is open-behind. Fuse xlator after nameless lookup,
> sends syncop_open to migrate the fds. Once the syncop_open is complete
> and fds are migrated, PARENT_DOWN event is sent on the old graph and the
> client xlator sends release on all the fds (if the previous syncop_open
> is successful, then its safe to send release from old graph as the new
> fd would have been migrated to the new graph, with corresponding fd
> present in the brick). But before that in syncop_open, open-behind might
> have sent success to the fuse without actually winding the open call to
> the below xlators. Now fuse gets success for the open, sends PARENT_DOWN
> to old graph, which sends release on the fd. Thus even though a fd is
> present from application's point of view, there are no mechanisms to
> access the file (as the fds and gfid handles have been removed already.)
> 
> Please provide feedback on the above issues.
> 

Hi Raghavendra,

I'm not intimately familiar with all of the interactions of this
problem, but the first thing that comes to mind when reading this is how
a traditional filesystem would manage this in-between state of an inode
using the notion of an "unlinked list." In other words, an inode is
placed on some internal list to reflect the period of time when it is no
longer linked in the namespace, but has not been completely deallocated
due to open references. This helps prevent losing track of it completely
until it can be cleaned up.

Could we incorporate such a mechanism, and does it help solve the
problem here? For example, tag an inode somehow to reflect that it's
hidden from "normal" access, or create a special unlinked subdirectory
somewhere under .glusterfs to keep the gfid and inode around until it's
no longer referenced.

Provided that is possible/useful, I guess we'd also need some way to
specify an internally generated open request should refer to the
"unlinked list" (xdata option?), but perhaps that's getting too far
ahead. Thoughts?

Brian

> 
> Regards,
> Raghavendra Bhat
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> https://lists.nongnu.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux