Hi Gal! On Wed 08-06-22 15:41:45, Gal Rosen wrote: > Thanks for the answer, just to make sure that I understand, if I see the > EMFILE error then it was on the first event and no event was copied to the > user buffer. > If it happened on the second or later events then the user will not see the > error, and will get length corresponding to the successfully formatted > events. > In both cases the events after the failure event will be saved in the > kernel queue, and I can try to read them at the next read ? Yes. > I want to understand if such a case is recoverable, because today at any > FANOTIFY error we use a methodology of safe-mode, in which we shutdown the > FANOTIFY, kill our service and come up again. > We do it because in the past we had some cases in which we did not write a > response on some file events and it stuck the whole system. So if you read permission event and do not write reply, it can indeed stall the whole system. But in case fanotify subsystem fails to create an event (such as in the EMFILE error case), we do properly clean up the failed event and the operation generating the event gets automatic "denied" response. > We do it for safety because we thought that there might be some file events > that we did not respond in a case of error, but if you are saying that all > events after the error event are still in the kernel queue and we do not > need to respond on them, then I guess we can continue run without > restarting our service. Yes, you should be able to continue without restarting. Honza > On Wed, Jun 8, 2022 at 2:57 PM Jan Kara <jack@xxxxxxx> wrote: > > > Hello, > > > > On Wed 08-06-22 14:33:47, Gal Rosen wrote: > > > One more question, if I do get into a situation in which I reach the > > limit > > > of the number of open files per my process, can I continue ? Can I > > continue > > > in my while loop and after a couple of microseconds for example can try > > to > > > re-read ? > > > If I get the error of EMFILE, it could be that some of the events > > > successfully read and are already in my user buffer, but I still get > > return > > > value of -1 on the read, does all the successful events are still in the > > > kernel queue and will be still there for the next read ? > > > > So if you get the EMFILE error, it means that we were not able to open file > > descriptor for the first event you are trying to read. If the same error > > happens for the second or later event copied into user provided buffer, we > > return length corresponding to the succesfully formatted events. Sadly, the > > event for which we failed to open file will be silently dropped in that > > case > > :-|. Amir, I guess we should at least report the event without the fd in > > that case. What do you think? > > > > Honza > > > > > On Wed, Jun 8, 2022 at 2:01 PM Gal Rosen <gal.rosen@xxxxxxxxxxxxxx> > > wrote: > > > > > > > Hi Amir, > > > > > > > > What do you mean by bumping the CAP_SYS_ADMIN limit ? > > > > You mean to increase the max open file for my process that watches the > > > > FANOTIFY fd ? > > > > May I instead decrease the read buffer size ? > > > > My read buffer is 4096 * 6, the fanotify_event_metadata structure size > > is > > > > 24 bytes, so it can hold 1024 file events at one read. > > > > My process Max open files soft limit is 1024, so why do I get this > > error ? > > > > Ohh, maybe because after reading the events I put them in a queue and > > > > continue for the next read, so if file events still have not been > > released > > > > by my application, then the next read can exceed 1024 files opened. > > > > > > > > Yes ,we use permission events. We watch on FAN_OPEN_PERM | > > FAN_CLOSE_WRITE. > > > > We also want to support the oldest kernels. > > > > > > > > BTW: What do you mean by "assuming that your process has > > CAP_SYS_ADMIN" ? > > > > > > > > Regarding the EPERM, how do we continue to investigate it ? > > > > > > > > Thanks, > > > > Gal. > > > > > > > > בתאריך יום ד׳, 8 ביוני 2022, 12:00, מאת Amir Goldstein < > > > > amir73il@xxxxxxxxx>: > > > > > > > >> On Wed, Jun 8, 2022 at 11:31 AM Gal Rosen <gal.rosen@xxxxxxxxxxxxxx> > > > >> wrote: > > > >> > > > > >> > Hi Jack, > > > >> > > > > >> > Can you provide details on the reason I sometimes get read errors on > > > >> events that I get from FANOTIFY ? > > > >> > My user space program watches on all mount points in the system and > > > >> sometimes when in parallel I run full scan with another application > > on all > > > >> my files in the endpoint, I get a read error when trying to read from > > the > > > >> FANOTIFY fd on a new event. > > > >> > The errno is sometimes EPERM (Operation not permitted) and sometimes > > > >> EMFILE (Too many open files). > > > >> > > > > >> > > > >> Hi Gal, > > > >> > > > >> EPERM is a bit surprising assuming that your process has > > CAP_SYS_ADMIN, > > > >> so needs investigating, but EMFILE is quite obvious. > > > >> Every event read needs to open a fd to place in event->fd. > > > >> If you exceed your configured limit, this error is expected. > > > >> You can bump the limit as CAP_SYS_ADMIN if that helps. > > > >> > > > >> > The last time I saw these errors, it was on RHEL 8.5, kernel > > > >> 4.18.0-348.23.1.el8_5.x86_64. > > > >> > > > >> Does your application even use permission events? > > > >> If it doesn't then watching with a newer kernel (>5.1) and > > FAN_ERPORT_FID > > > >> is going to be more efficient in resources and you wont need to worry > > > >> about open files limits. > > > >> > > > >> Thanks, > > > >> Amir. > > > >> > > > > > > -- > > Jan Kara <jack@xxxxxxxx> > > SUSE Labs, CR > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR