Re: [PATCH v6 18/21] fanotify: Emit generic error info type for error event

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 18 Aug 2021 06:24:26 +0300

[...]

> > Just keep in mind that the current scheme pre-allocates the single event slot
> > on fanotify_mark() time and (I think) we agreed to pre-allocate
> > sizeof(fsnotify_error_event) + MAX_HDNALE_SZ.
> > If filesystems would want to store some variable length fs specific info,
> > a future implementation will have to take that into account.
>
> <nod> I /think/ for the fs and AG metadata we could preallocate these,
> so long as fsnotify doesn't free them out from under us.

fs won't get notified when the event is freed, so fsnotify must
take ownership on the data structure.
I was thinking more along the lines of limiting maximum size for fs
specific info and pre-allocating that size for the event.

> For inodes...
> there are many more of those, so they'd have to be allocated
> dynamically.

The current scheme is that the size of the queue for error events
is one and the single slot is pre-allocated.
The reason for pre-allocate is that the assumption is that fsnotify_error()
could be called from contexts where memory allocation would be
inconvenient.
Therefore, we can store the encoded file handle of the first erroneous
inode, but we do not store any more events until user read this
one event.

> Hmm.  For handling accumulated errors, can we still access the
> fanotify_event_info_* object once we've handed it to fanotify?  If the
> user hasn't picked up the event yet, it might be acceptable to set more
> bits in the type mask and bump the error count.  In other words, every
> time userspace actually reads the event, it'll get the latest error
> state.  I /think/ that's where the design of this patchset is going,
> right?

Sort of.
fsnotify does have a concept of "merging" new event with an event
already in queue.

With most fsnotify events, merge only happens if the info related
to the new event (e.g. sb,inode) is the same as that off the queued
event and the "merge" is only in the event mask
(e.g. FS_OPEN|FS_CLOSE).

However, the current scheme for "merge" of an FS_ERROR event is only
bumping err_count, even if the new reported error or inode do not
match the error/inode in the queued event.

If we define error event subtypes (e.g. FS_ERROR_WRITEBACK,
FS_ERROR_METADATA), then the error event could contain
a field for subtype mask and user could read the subtype mask
along with the accumulated error count, but this cannot be
done by providing the filesystem access to modify an internal
fsnotify event, so those have to be generic UAPI defined subtypes.

If you think that would be useful, then we may want to consider
reserving the subtype mask field in fanotify_event_info_error in
advance.

>
> > > > 2) If a program written for today's notification events sees a
> > > > fanotify_event_info_header from future-XFS with a header length that is
> > > > larger than FANOTIFY_INFO_ERROR_LEN, will it be able to react
> > > > appropriately?  Which is to say, ignore it on the grounds that the
> > > > length is unexpectedly large?
> > >
> > > That is the expected behavior :). But I guess separate info type for
> > > fs-specific blob might be more foolproof in this sense - when parsing
> > > events, you are expected to just skip info_types you don't understand
> > > (based on 'len' and 'type' in the common header) and generally different
> > > events have different sets of infos attached to them so you mostly have to
> > > implement this logic to be able to process events.
> > >
> > > > It /looks/ like this is the case; really I'm just fishing around here
> > > > to make sure nothing in the design of /this/ patchset would make it Very
> > > > Difficult(tm) to add more information later.
> > > >
> > > > 3) Once we let filesystem implementations create their own extended
> > > > error notifications, should we have a "u32 magic" to aid in decoding?
> > > > Or even add it to fanotify_event_info_error now?
> > >
> > > If we go via the 'separate info type' route, then the magic can go into
> > > that structure and there's no great use for 'magic' in
> > > fanotify_event_info_error.
> >
> > My 0.02$:
> > With current patch set, filesystem reports error using:
> > fsnotify_sb_error(sb, inode, error)
> >
> > The optional @inode argument is encoded to a filesystem opaque
> > blob using  exportfs_encode_inode_fh(), recorded in the event
> > as a blob and reported to userspace as a blob.
> >
> > If filesystem would like to report a different type of opaque blob
> > (e.g. xfs_perag_info), the interface should be extended to:
> > fsnotify_sb_error(sb, inode, error, info, info_len)
> > and the 'separate info type' route seems like the best and most natural
> > way to deal with the case of information that is only emitted from
> > a specific filesystem with a specific feature enabled (online fsck).
>
> <nod> This seems reasonable to me.
>
> > IOW, there is no need for fanotify_event_info_xfs_perag_error
> > in fanotify UAPI if you ask me.
> >
> > Regarding 'magic' in fanotify_event_info_error, I also don't see the
> > need for that, because the event already has fsid which can be
> > used to identify the filesystem in question.
> >
> > Keep in mind that the value of handle_type inside struct file_handle
> > inside struct fanotify_event_info_fid is not a universal classifier.
> > Specifically, the type 0x81 means "XFS_FILEID_INO64_GEN"
> > only in the context of XFS and it can mean something else in the
> > context of another type of filesystem.
>
> Can you pass the handle into the kernel to open a fd to file mentioned
> in the report?  I don't think userspace is supposed to know what's
> inside a file handle, and it would be helpful if it didn't matter here
> either. :)
>

User gets a file handle and can do whatever users can do with file
handles... that is, open_by_handle_at() (if filesystem and inode are
still alive and healthy) and for less privileged users, compare with
result of name_to_handle_at() of another object.

Obviously, filesystem specialized tools could parse the file handle
to extract more information.

> > If we add a new info record fanotify_event_info_fs_private
> > it could even be an alias to fanotify_event_info_fid with the only
> > difference that the handle[0] member is not expected to be
> > struct file_handle, but some other fs private struct.
>
> I ... think I prefer it being a separate info blob.
>

Yes. That is what I meant.
Separate info record INFO_TYPE_ERROR_FS_DATA, whose info record
format is quite the same as that of INFO_TYPE_FID, but the blob is a
different type of blob.

Thanks,
Amir.