On Mon 19-02-18 21:07:28, Amir Goldstein wrote: > On Mon, Feb 19, 2018 at 3:50 PM, Jan Kara <jack@xxxxxxx> wrote: > [...] > > For fanotify without FAN_UNLIMITED_QUEUE the situation is similar as for > > inotify - IMO low practical impact, apps should generally handle queue > > overflow so I don't see a need for any opt in (more accurate memcg charging > > takes precedense over possibly broken apps). > > > > For fanotify with FAN_UNLIMITED_QUEUE the situation is somewhat different - > > firstly there is a practical impact (memory consumption is not limited by > > anything else) and secondly there are higher chances of the application > > breaking (no queue overflow expected) and also that this breakage won't be > > completely harmless (e.g., the application participates in securing the > > system). I've been thinking about this "conflict of interests" for some > > time and currently I think that the best handling of this is that by > > default events for FAN_UNLIMITED_QUEUE groups will get allocated with > > GFP_NOFAIL - such groups can be created only by global CAP_SYS_ADMIN anyway > > so it is reasonably safe against misuse (and since the allocations are > > small it is in fact equivalent to current status quo, just more explicit). > > That way application won't see unexpected queue overflow. The process > > generating event may be looping in the allocator but that is the case > > currently as well. Also the memcg with the consumer of events will have > > higher chances of triggering oom-kill if events consume too much memory but > > I don't see how this is not a good thing by default - and if such reaction > > is not desirable, there's memcg's oom_control to tune the OOM behavior > > which has capabilities far beyond of what we could invent for fanotify... > > > > What do you think Amir? > > > > If I followed all your reasoning correctly, you propose to change behavior to > always account events to group memcg and never fail event allocation, > without any change of API and without opting-in for new behavior? > I think it makes sense. I can't point at any expected breakage, > so overall, this would be a good change. > > I just feel sorry about passing an opportunity to improve functionality. > The fact that fanotify does not have a way for defining the events queue > size is a deficiency IMO, one which I had to work around in the past. > I find that assigning group to memgc and configure memcg to desired > memory limit and getting Q_OVERFLOW on failure to allocate event > is going to be a proper way of addressing this deficiency. So if you don't pass FAN_Q_UNLIMITED, you will get queue with a fixed size and will get Q_OVERFLOW if that is exceeded. So is your concern that you'd like some other fixed limit? Larger one or smaller one and for what reason? > But if you don't think we should bind these 2 things together, > I'll let Shakeel decide if he want to pursue the Q_OVERFLOW change > or not. So if there is still some uncovered use case for finer tuning of event queue length than setting or not setting FAN_Q_UNLIMITED (+ possibly putting the task to memcg to limit memory usage), we can talk about how to address that but at this point I don't see a strong reason to bind this to whether / how events are accounted to memcg... And we still need to make sure we properly do ENOMEM -> Q_OVERFLOW translation and use GFP_NOFAIL for FAN_Q_UNLIMITED groups before merging Shakeel's memcg accounting patches. But Shakeel does not have to be the one implementing that (although if you want to, you are welcome Shakeel :) - otherwise I hope I'll get to it reasonably soon). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR