Re: Report ESTALE as ENOENT

Vijay Bellur <vbellur@xxxxxxxxxx> · Mon, 28 Mar 2016 16:21:00 -0400

On 03/28/2016 09:34 AM, FNU Raghavendra Manjunath wrote:

I can understand the concern. But I think instead of generally
converting all the ESTALE errors ENOENT, probably we should try to
analyze the errors that are generated by lower layers (like posix).

Even fuse kernel module some times returns ESTALE. (Well, I can see it
returning ESTALE errors in some cases in the code. Someone please
correct me if thats not the case).  And aso I am not sure if converting
all the ESTALE errors to ENOENT is ok or not.

ESTALE in fuse is returned only for export_operations. fuse implements 
this for providing support to export fuse mounts as nfs exports. A 
cursory reading of the source seems to indicate that fuse returns ESTALE 
only in cases where filehandle resolution fails.

For fd based operations, I am not sure if ENOENT can be sent or not (as
though the file is unlinked, it can be accessed if there were open fds
on it before unlinking the file).

ESTALE should be fine for fd based operations. It would be analogous to 
a filehandle resolution failing and should not be a common occurrence.

I feel, we have to look into some parts to check if they generating
ESTALE is a proper error or not. Also, if there is any bug in below
layers fixing which can avoid ESTALE errors, then I feel that would be
the better option.

I would prefer to:

1. Return ENOENT for all system calls that operate on a path.

2. ESTALE might be ok for file descriptor based operations.

NFS recommends that applications add special code for handling ESTALE 
[1]. Unfortunately changing application code is not easy and hence it 
does not come as a surprise that coreutils also does not accommodate 
ESTALE. I would not like to use NFS as a precedent for us to be commonly 
returning ESTALE back to applications.

Regards,
Vijay

[1] A10 of http://nfs.sourceforge.net/

On Mon, Mar 28, 2016 at 1:39 AM, Prashanth Pai <ppai@xxxxxxxxxx
<mailto:ppai@xxxxxxxxxx>> wrote:

    TL;DR: +1 to report ESTALE as ENOENT

    While ESTALE is an acceptable errno for NFS clients, it's not so
    much for
    FUSE clients. Many applications that talk to a FUSE mount do not handle
    ESTALE and expect the behavior to be analogous to that of local
    filesystems such as XFS. While it's okay for brick to send ESTALE to
    glusterfs client stack, one has to be very careful about errno
    returned by
    FUSE back to applications.

    For example, syscalls such as fgetxattr are not expected (at least from
    manpage) to throw ESTALE but with glusterfs, it does[1]. Further, POSIX
    guarantees that once an application has a valid fd, operations like
    fgetxattr() on the fd should succeed even after another
    application(client)
    issues an unlink()

    [1]:http://paste.openstack.org/show/335506/

    Regards,
      -Prashanth Pai

    ----- Original Message -----
     > From: "FNU Raghavendra Manjunath" <rabhat@xxxxxxxxxx
    <mailto:rabhat@xxxxxxxxxx>>
     > To: "Soumya Koduri" <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>>
     > Cc: "Ira Cooper" <icooper@xxxxxxxxxx
    <mailto:icooper@xxxxxxxxxx>>, "Gluster Devel"
    <gluster-devel@xxxxxxxxxxx <mailto:gluster-devel@xxxxxxxxxxx>>
     > Sent: Thursday, March 24, 2016 8:11:19 PM
     > Subject: Re:  Report ESTALE as ENOENT
     >
     >
     > I would still prefer not converting all the ESTALE to ENOENT. I
    think we need
     > to understand this specific case of parallel rm -rfs getting
    ESTALE errors
     > and handle it accordingly.
     >
     > Regarding, gfapi not honoring the ESTALE errors, I think it would
    be better
     > to do revalidates upon getting ESTALE.
     >
     > Just my 2 cents.
     >
     > Regards,
     > Raghavendra
     >
     >
     > On Thu, Mar 24, 2016 at 10:31 AM, Soumya Koduri <
    skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx> > wrote:
     >
     >
     > Thanks for the information.
     >
     > On 03/24/2016 07:34 PM, FNU Raghavendra Manjunath wrote:
     >
     >
     >
     > Yes. I think the caching example mentioned by Shyam is a good
    example of
     > ESTALE error. Also User Serviceable Snapshots (USS) relies heavily on
     > ESTALE errors. Because the files/directories from the snapshots are
     > assigned a virtual gfid on the fly when being looked up. If those
    inodes
     > are purged out of the inode table due to lru list becoming full,
    then a
     > access to that gfid from the client, will make snapview-server send
     > ESTALE and either fuse (I think our fuse xlator does a revalidate
    upon
     > getting ESTALE) or NFS client can revalidate via path based
    resolution.
     >
     > So wouldn't it be wrong not to send ESTALE to NFS-clients and map
    it to
     > ENOENT, as was intended in the original mail.
     >
     > NFSv3 rfc [1] mentions that NFS3ERR_STALE is a valid error for
    REMOVE fop.
     >
     > Also (at least in gfapi) the resolve code path doesn't seem to be
    honoring
     > ESTALE errors - glfs_resolve_component(..),
    glfs_refresh_inode_safe(..)
     > etc.. Do we need to fix them?
     >
     >
     > Thanks,
     > Soumya
     >
     > [1] https://www.ietf.org/rfc/rfc1813.txt (section# 3.3.12)
     >
     >
     >
     >
     > Regards,
     > Raghavendra
     >
     >
     > On Thu, Mar 24, 2016 at 9:51 AM, Shyam < srangana@xxxxxxxxxx
    <mailto:srangana@xxxxxxxxxx>
     > <mailto: srangana@xxxxxxxxxx <mailto:srangana@xxxxxxxxxx> >> wrote:
     >
     > On 03/23/2016 12:07 PM, Ravishankar N wrote:
     >
     > On 03/23/2016 09:16 PM, Soumya Koduri wrote:
     >
     > If it occurs only when the file/dir is not actually present
     > at the
     > back-end, shouldn't we fix the server to send ENOENT then?
     >
     > I never fully understood it here is the answer:
     > http://review.gluster.org/#/c/6318/
     >
     >
     > The intention of ESTALE is to state that the inode#/GFID is stale,
     > when using that for any operations. IOW, we did not find the GFID in
     > the backend, that does not mean the name is not present. This hence
     > means, if you have a pGFID+bname, try resolving with that.
     >
     > For example, a client side cache can hang onto a GFID for a bname,
     > but another client could have, in the interim, unlinked the bname
     > and create a new file there.
     >
     > A presence test using GFID by the client that cached the result the
     > first time, is an ESTALE. But a resolution based on pGFID+bname
     > again by the same client would be a success.
     >
     > By extension, a GFID based resolution, when not really present in
     > the backend will return ESTALE, it could very well mean ENOENT, but
     > that has to be determined by the client again, if possible.
     >
     > See "A10. What does it mean when my application fails because of an
     > ESTALE error?" for NFS here [1] and [2] (there maybe better sources
     > for this information)
     >
     > [1] http://nfs.sourceforge.net/
     > [2] https://lwn.net/Articles/272684/
     >
     >
     >
     > _______________________________________________
     > Gluster-devel mailing list
     > Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
    <mailto: Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx> >
     > http://www.gluster.org/mailman/listinfo/gluster-devel
     >
     > _______________________________________________
     > Gluster-devel mailing list
     > Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
    <mailto: Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx> >
     > http://www.gluster.org/mailman/listinfo/gluster-devel
     >
     >
     >
     >
     > _______________________________________________
     > Gluster-devel mailing list
     > Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
     > http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel