Re: [EXTERNAL] Re: Git clone fails in p9 file system marked with FANOTIFY

Amir Goldstein <amir73il@xxxxxxxxx> · Fri, 27 Sep 2024 10:35:17 +0200

>
> -----Original Message-----
> From: Amir Goldstein <amir73il@xxxxxxxxx>
> Sent: Friday, September 27, 2024 3:40 AM
> To: Krishna Vivek Vitta <kvitta@xxxxxxxxxxxxx>
> Cc: Jan Kara <jack@xxxxxxx>; linux-fsdevel@xxxxxxxxxxxxxxx; Eric Van Hensbergen <ericvh@xxxxxxxxxx>; Latchesar Ionkov <lucho@xxxxxxxxxx>; Dominique Martinet <asmadeus@xxxxxxxxxxxxx>; v9fs@xxxxxxxxxxxxxxx
> Subject: Re: [EXTERNAL] Re: Git clone fails in p9 file system marked with FANOTIFY
>
> On Thu, Sep 26, 2024 at 10:28 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > > > What would be the next steps for this investigation ?
> > > >
> > >
> > > I need to find some time and to debug the reason for 9p open failure
> > > so we can make sure the problem is in 9p code and report more
> > > details of the bug to 9p maintainers, but since a simple reproducer
> > > exists, they can also try to reproduce the issue right now.
> >
> > FWIW, the attached reproducer mimics git clone rename_over pattern closer.
> > It reproduces fanotify_read() errors sometimes, not always, with
> > either CLOSE_WRITE or OPEN_PERM | CLOSE_WRITE.
> > maybe CLOSE_WRITE alone has better odds - I'm not sure.
> >
>
> scratch that.
> I think the renames were just a destruction git clone events do not always fail on a close+rename pattern, they always fail on the close+unlink that follows the renames:
>
> 1776  openat(AT_FDCWD, "/vtmp/filebench/.git/tjEzMUw", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
> 1776  close(3)                          = 0
> 1776  unlink("/vtmp/filebench/.git/tjEzMUw") = 0
> 1776  symlink("testing", "/vtmp/filebench/.git/tjEzMUw") = 0
> 1776  lstat("/vtmp/filebench/.git/tjEzMUw", {st_mode=S_IFLNK|0777, st_size=7, ...}) = 0
> 1776  unlink("/vtmp/filebench/.git/tjEzMUw") = 0
>
> I know because I added this print:
>
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -275,6 +275,7 @@ static int create_fd(struct fsnotify_group *group, const struct path *path,
>                  */
>                 put_unused_fd(client_fd);
>                 client_fd = PTR_ERR(new_file);
> +               pr_warn("%s(%pd2): ret=%d\n", __func__, path->dentry,
> client_fd);
>         } else {
>
> We should consider adding it as is or maybe ratelimited.
>
> The trivial reproducer below fails fanotify_read() always with one try.
>
> Thanks,
> Amir.
>
> int main() {
>     const char *filename = "config.lock";
>     int fd;
>
>     // Create a new file
>     fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
>     if (fd == -1) {
>         perror("Failed to create file");
>         return EXIT_FAILURE;
>     }
>     close(fd);
>
>     // Remove the file
>     if (unlink(filename) != 0) {
>         perror("Failed to unlink file");
>         return EXIT_FAILURE;
>     }
>
>     return EXIT_SUCCESS;
> }

On Fri, Sep 27, 2024 at 8:34 AM Krishna Vivek Vitta
<kvitta@xxxxxxxxxxxxx> wrote:
>
> Hi Amir

Hi Krishna,

Please do not "top post" on mailing lists.
It makes it very hard for people that are trying to follow the
conversation in mailing archives
or join in the middle of the conversation.

>
> Thanks for the experiment.
> Though reproducer program is succeeding but fanotify listener is terminating since its failing to read the event. Right ?

Right, the fanotify_example.c is programmed to stop on event read error.
Productions fanotify listeners should log the error or ignore it, but not abort.

>
> Can you elaborate on: " We should consider adding it as is or maybe ratelimited." ?

When fanotify fails to open an fd for reporting an event, we either
return the error to fanotify_read() or silently ignore the event.
What I meant is that we should add the pr_warn() in that case
to make the problem more visible to sysadmins.

>
> Does this mean there should be a fix at fanotify side ?

Yes, I think there should be a fix in fanotify.
9p is not doing something terribly wrong and as Jan
already wrote, the exact same case from nfs/cifs is handled
by ignoring the open error.

>
> Can you summarize the problem statement for once in larger interest of group.
>

Yes, I will try.

fanotify needs to report an open fd with the event (in some operation modes).
The open fd needs to be created and installed in the fd table of the
listener process
that is calling fanotify_read() to read the event.

By the time that the listener process wants to read the events, the
file in question
may have already been unlinked.

fanotify holds a reference on the dentry where an event happened while the
event is in the queue.
In local filesystems, it should not be a problem to open an unlinked file
from a referenced dentry.
For example, If adding sleep(1) before handle_events() then
fanotify_example will
print these messages from ext/xfs:

   FAN_CLOSE_WRITE: File /vdc/config.lock (deleted)

The problem is that remote/network filesystems cannot provide
full guarantee that an unlinked file can be opened even if the
dentry reference exists.

nfs/nfs4/cifs clients return -EOPENSTALE in that case, so vfs/nfsd
can retry the open and fanotify also recognizes this error and
silently drops the event.

But vfs/fanotify cannot guarantee that all the remote/network
filesystems will return this special error code and as the comment
in  create_fd() says, there are many other cases that open can fail:

/*
 * we still send an event even if we can't open the file.  this
 * can happen when say tasks are gone and we try to open their
 * /proc files or we try to open a WRONLY file like in sysfs
 * we just send the errno to userspace since there isn't much
 * else we can do.
 */

The problems are:
1. The first line of the comment is wrong - we are not sending the event
2. We only send the errno to userspace if this is the first event being read
3. Even if userspace gets the errno, there is very little visibility about the
    event that caused the error (*)

(*) This can be improved by the pr_warn() that I suggested

Frankly, I never understood why this situation is not handled
as the first line of the comment claims - that is to send the
event with event->fd = -errno.

This way userspace gets an event about every error and get
try to do something about it (start an investigation).
Getting the errno sometimes and ignoring it sometimes
is a very poor UAPI.

> I assume the whole of these experiments succeed in ext, xfs.
>

Yes. This specific experiment succeeds on ext/xfs.
However, other failures to open fd for event can also
happen on ext/xfs, for example if the listener is watching
FAN_CLOSE|FAN_OPEN and fanotify_init() uses O_WRONLY
as event_f_flags argument and the watched fs is mounted read-only,
then you would get an error like this in fanotify_example,
after opening a file for read on ext/xfs:

   read: Read-only file system

At the bottom line, this is a UAPI issue. The UAPI can be improved,
but we will probably have to make a new opt-in flag for a new UAPI.

But I must say that I do not fully understand your original report.
Your original report claimed that git clone fails when
MDE(Microsoft Defender for Endpoint) is watching CLOSE_WRITE
events on a 9p mount.

Failures to report a FAN_CLOSE_WRITE event should have never
failed git clone. Only failure to report FAN_OPEN_PERM could have
failed git clone operations, but in that case, open would have failed anyway.

My guess is that a would-be ENOENT open failure is replaced with
an EPERM open failure when fanotify is watching FAN_OPEN_PERM
events, but I have no confirmation to this assumption from your strace.

Unless I misunderstand the report, it sounds like the fix (to git clone
failure) should be in MDE software, because I could never reproduce
git clone failure with fanotify_example, no matter with events I tried.

Anyway, if I misunderstood something in the report, please clarify.

Thanks,
Amir.