On Tue, Jun 29, 2021 at 10:13 PM Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> wrote: > > The FAN_FS_ERROR event is a new inode event used by filesystem wide > monitoring tools to receive notifications of type FS_ERROR_EVENT, > emitted directly by filesystems when a problem is detected. The error > notification includes a generic error descriptor and a FID identifying > the file affected. > > FID is sent for every FAN_FS_ERROR. Errors not linked to a regular inode > are reported against the root inode. > > An error reporting structure is attached per-mark, and only a single > error can be stored at a time. This is ok, since once an error occurs, > it is common for a stream of related errors to be reported. We only log > accumulate the total of errors occurred since the last notification. > > Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> > > --- > Changes since v2: > - Support and equire FID mode (amir) > - Goto error path instead of early return (amir) > - Simplify get_one_event (me) About that... > - Base merging on error_count > - drop fanotify_queue_error_event [...] > +static int fanotify_merge_error_event(struct fsnotify_group *group, > + struct fsnotify_event *event) > +{ > + struct fanotify_event *fae = FANOTIFY_E(event); > + > + /* > + * When err_count > 0, the reporting slot is full. Just account > + * the additional error and abort the insertion. > + */ > + if (atomic_fetch_inc(&FANOTIFY_EE(fae)->err_count) != 0) > + return 1; It feels a bit unsafe to bump err_count to 1 in merge() and set the error info in insert(). feels like looking for trouble. Maybe atomic_inc_not_zero() would have been better, but since I am going to argue against modifying err_count outside the notification_lock, it does not need to be atomic at all. > + > + return 0; > +} > + > +static void fanotify_insert_error_event(struct fsnotify_group *group, > + struct fsnotify_event *event, > + const void *data) > +{ > + const struct fsnotify_event_info *ei = > + (struct fsnotify_event_info *) data; > + struct fanotify_event *fae = FANOTIFY_E(event); > + const struct fs_error_report *report; > + struct fanotify_error_event *fee; > + struct inode *inode; > + int fh_len; > + > + /* This might be an unexpected type of event (i.e. overflow). */ > + if (!fanotify_is_error_event(fae->mask)) > + return; > + > + report = (struct fs_error_report *) ei->data; > + inode = report->inode ?: ei->sb->s_root->d_inode; As I commented on the cover letter, I think it would be better to encode a NULL-FID in case of NULL inode, e.g.: static int fanotify_encode_null_fh(struct fanotify_fh *fh) { fh->type = FILEID_ROOT; fh->len = 8; fh->flags = 0; memset(fh->buf, 0, 8); } But that API may be controversial, so wait for other voices before changing that. > + > + fee = FANOTIFY_EE(fae); > + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; > + fee->error = report->error; > + fee->fsid = fee->mark->connector->fsid; > + > + fsnotify_get_mark(fee->mark); > + > + /* > + * Error reporting needs to happen in atomic context. If this > + * inode's file handler is more than we initially predicted, > + * there is nothing better we can do than report the error with > + * a bad FH. > + */ > + fh_len = fanotify_encode_fh_len(inode); > + if (WARN_ON(fh_len > fee->max_fh_len)) > + return; > + > + fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0); > +} > + [...] > +static size_t copy_error_info_to_user(struct fanotify_event *event, > + char __user *buf, int count) > +{ > + struct fanotify_event_info_error info; > + struct fanotify_error_event *fee = FANOTIFY_EE(event); > + > + info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR; > + info.hdr.pad = 0; > + info.hdr.len = sizeof(struct fanotify_event_info_error); > + > + if (WARN_ON(count < info.hdr.len)) > + return -EFAULT; > + > + info.error = fee->error; > + > + /* This effectively releases the event for logging another error */ > + info.error_count = atomic_xchg(&fee->err_count, 0); > + it's a shame because the code is really simpler, but I am afraid it may not be correct, even without mentioning breaking the queue abstraction. While the read/insert of err_count is "atomic", read of FID is done after this point, not to mention that read of fee->error could be reordered without barriers. FID and error could be set by insert event after reading fee->err_count. You can either go back to copying the error report to stack on dequeue (though it was a bit ugly) under the group notification_lock or you can allocate a new initialized error event on dequeue and exchange it with the mark's event, without having to change the calling semantics of get_one_event(), e.g.: fsnotify_remove_first_event(group); if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED; if (fanotify_is_hashed_event(event->mask)) fanotify_unhash_event(group, event); + if (fanotify_is_error_event(event->mask)) + event = fanotify_recreate_fs_error_event(event); If allocation fails, this helper returns ERR_PTR(-EINVAL) and resets the info and err_count of the event in the mark. Thanks, Amir.