On Tue, Feb 20, 2018 at 4:43 AM, Jan Kara <jack@xxxxxxx> wrote: > On Mon 19-02-18 21:07:28, Amir Goldstein wrote: >> On Mon, Feb 19, 2018 at 3:50 PM, Jan Kara <jack@xxxxxxx> wrote: >> [...] >> > For fanotify without FAN_UNLIMITED_QUEUE the situation is similar as for >> > inotify - IMO low practical impact, apps should generally handle queue >> > overflow so I don't see a need for any opt in (more accurate memcg charging >> > takes precedense over possibly broken apps). >> > >> > For fanotify with FAN_UNLIMITED_QUEUE the situation is somewhat different - >> > firstly there is a practical impact (memory consumption is not limited by >> > anything else) and secondly there are higher chances of the application >> > breaking (no queue overflow expected) and also that this breakage won't be >> > completely harmless (e.g., the application participates in securing the >> > system). I've been thinking about this "conflict of interests" for some >> > time and currently I think that the best handling of this is that by >> > default events for FAN_UNLIMITED_QUEUE groups will get allocated with >> > GFP_NOFAIL - such groups can be created only by global CAP_SYS_ADMIN anyway >> > so it is reasonably safe against misuse (and since the allocations are >> > small it is in fact equivalent to current status quo, just more explicit). >> > That way application won't see unexpected queue overflow. The process >> > generating event may be looping in the allocator but that is the case >> > currently as well. Also the memcg with the consumer of events will have >> > higher chances of triggering oom-kill if events consume too much memory but >> > I don't see how this is not a good thing by default - and if such reaction >> > is not desirable, there's memcg's oom_control to tune the OOM behavior >> > which has capabilities far beyond of what we could invent for fanotify... >> > >> > What do you think Amir? >> > >> >> If I followed all your reasoning correctly, you propose to change behavior to >> always account events to group memcg and never fail event allocation, >> without any change of API and without opting-in for new behavior? >> I think it makes sense. I can't point at any expected breakage, >> so overall, this would be a good change. >> >> I just feel sorry about passing an opportunity to improve functionality. >> The fact that fanotify does not have a way for defining the events queue >> size is a deficiency IMO, one which I had to work around in the past. >> I find that assigning group to memgc and configure memcg to desired >> memory limit and getting Q_OVERFLOW on failure to allocate event >> is going to be a proper way of addressing this deficiency. > > So if you don't pass FAN_Q_UNLIMITED, you will get queue with a fixed size > and will get Q_OVERFLOW if that is exceeded. So is your concern that you'd > like some other fixed limit? Larger one or smaller one and for what > reason? > >> But if you don't think we should bind these 2 things together, >> I'll let Shakeel decide if he want to pursue the Q_OVERFLOW change >> or not. > > So if there is still some uncovered use case for finer tuning of event > queue length than setting or not setting FAN_Q_UNLIMITED (+ possibly > putting the task to memcg to limit memory usage), we can talk about how to > address that but at this point I don't see a strong reason to bind this to > whether / how events are accounted to memcg... > > And we still need to make sure we properly do ENOMEM -> Q_OVERFLOW > translation and use GFP_NOFAIL for FAN_Q_UNLIMITED groups before merging > Shakeel's memcg accounting patches. But Shakeel does not have to be the one > implementing that (although if you want to, you are welcome Shakeel :) - > otherwise I hope I'll get to it reasonably soon). > Thanks Jan & Amir for the help and explanation. I think, Jan, you can implement the "ENOMEM -> Q_OVERFLOW" and GFP_NOFAIL changes better than me. I will send out my patches with minor changes based on feedback but I will let Andrew know to keep my patches in mm tree and not send for upstream merge. Once Jan has added his patches, I will Andrew know to go forward with my patches. thanks, Shakeel -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html