Re: Report ESTALE as ENOENT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Thu, Feb 22, 2018 at 1:17 PM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote:


On Wed, Oct 11, 2017 at 7:32 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Wed, Oct 11, 2017 at 04:11:51PM +0530, Raghavendra G wrote:
> On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx>
> wrote:
>
> > On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote:
> > > I would prefer to:
> > >
> > > 1. Return ENOENT for all system calls that operate on a path.
> > >
> > > 2. ESTALE might be ok for file descriptor based operations.
> >
> > Note that operations which operate on paths can fail with ESTALE when
> > they attempt to look up a component within a directory that no longer
> > exists.
> >
>
> But, "man 2 rmdir"  or "man 2 unlink" doesn't list ESTALE as a valid error.

In fact, almost no man pages list ESTALE as a valid error:

        [bfields@patate man-pages]$ git grep ESTALE
        Changes.old:        Change description for ESTALE
        man2/open_by_handle_at.2:.B ESTALE
        man2/open_by_handle_at.2:.B ESTALE
        man3/errno.3:.B ESTALE

Cc'ing Michael Kerrisk for advice.  Is there some reason for that, or
can we fix those man pages?

> Also rm doesn't seem to handle ESTALE too [3]
>
> [4] https://github.com/coreutils/coreutils/blob/master/src/remove.c#L305

I *think* that code is just deciding whether a given error should be
silently ignored in the rm -f case.  I don't think -ESTALE (indicating
the directory is bad) is such an error, so I think this code is correct.
But my understanding may be wrong.

For a local filesystem, we may not end up in ESTALE errors. But, when rmdir is executed from multiple clients of a network fs (like NFS, Glusterfs), unlink or rmdir can easily fail with ESTALE as the other rm invocation could've deleted it. I think this is what has happened in bugs like:

This in fact was the earlier motivation to convert ESTALE into ENOENT, so that rm would ignore it. Now that I reverted the fix, looks like the bug has promptly resurfaced :)

There is one glitch though. Bug 1245065 mentions that some parts of directory structure remain undeleted. From my understanding, atleast one instance of rm (which is racing ahead of all others causing others to fail), should've delted the directory structure completely. Though, I need to understand the directory traversal done by rm to find whether there are cyclic dependency between two rms causing both of them to fail.

Also note that VFS retries unlink and rmdir if there is an estale:

https://github.com/torvalds/linux/blob/master/fs/namei.c#L4056
https://github.com/torvalds/linux/blob/master/fs/namei.c#L3927

So, underlying fs like Glusterfs cannot mask ESTALE as it'll break some functionality.

In this scenario what are the ways to fix a failing rm when run from multiple mount points (Bugs I pointed out above)? I really can't think of a way other than fixing rm to ignore ESTALE error.



> > Maybe non-creating open("./foo") returning ENOENT would be reasonable in
> > this case since that's what you'd get in the local filesystem case, but
> > creat("./foo") returning ENOENT, for example, isn't something
> > applications will be written to handle.
> >
> > The Linux VFS will retry ESTALE on path-based systemcalls *one* time, to
> > reduce the chance of ESTALE in those cases.
>
>
> I should've anticipated bug [2] due to this comment. My mistake. Bug [2] is
> indeed due to kernel not retrying open on receiving an ENOENT error.
> Glusterfs sent ENOENT because file's inode-number/nodeid changed but same
> path exists. The correct error would've been ESTALE, but due to our
> conversion of ESTALE to ENOENT, the latter was sent back to kernel.
>
> Looking through kernel VFS code, only open *seems* to retry
> (do_filep_open). I couldn't find similar logic to other path based syscalls
> like rmdir, unlink, stat, chmod etc

I believe there is a retry in those cases, but I'm not sure exactly
where it is.  Looking around.... See the retry_estale() checks sprinkled
around namei.c, which were added by Jeff Layton a few years ago.

--b.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel



--
Raghavendra G



--
Raghavendra G
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux