On Tue, Feb 13, 2018 at 2:20 PM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > On Tue, Feb 13, 2018 at 1:54 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >> On Tue, Feb 13, 2018 at 11:10 PM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: >>> On Mon, Feb 12, 2018 at 10:30 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>>> On Thu, Jan 25, 2018 at 10:36 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>>>> On Thu, Jan 25, 2018 at 10:20 PM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: >>>>>> On Wed, Jan 24, 2018 at 11:51 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>>>>>> >>>>>>> There is a nicer alternative, instead of failing the file access, >>>>>>> an overflow event can be queued. I sent a patch for that and Jan >>>>>>> agreed to the concept, but thought we should let user opt-in for this >>>>>>> change: >>>>>>> https://marc.info/?l=linux-fsdevel&m=150944704716447&w=2 >>>>>>> >>>>>>> So IMO, if user opts-in for OVERFLOW instead of ENOMEM, >>>>>>> charging the listener memcg would be non controversial. >>>>>>> Otherwise, I cannot say that starting to charge the listener memgc >>>>>>> for events won't break any application. >>>>>>> >>>>>> >>>> >>>> Shakeel, Jan, >>>> >>>> Reviving this thread and adding linux-api, because I think it is important to >>>> agree on the API before patches. >>>> >>>> The last message on the thread you referenced suggest an API change >>>> for opting in for Q_OVERFLOW on ENOMEM: >>>> https://marc.info/?l=linux-api&m=150946878623441&w=2 >>>> >>>> However, the suggested API change in in fanotify_mark() syscall and >>>> this is not the time when fsnotify_group is initialized. >>>> I believe for opting-in to accounting events for listener, you >>>> will need to add an opt-in flag for the fanotify_init() syscall. >>>> >>> >>> I thought the reason to opt-in "charge memory to listener" was the >>> risk of oom-killing the listener but it is now clear that there will >>> be no oom-kills on memcg hitting its limit (no oom-killing listener >>> risk). In my (not so strong) opinion we should only opt-in for >>> receiving the {FAN|IN}_Q_OVERFLOW event on ENOMEM but always charge >>> the memory for events to the listener's memcg if kmem accounting is >>> enabled. >>> >> >> I agree that charging listener's memcg is preferred, but it is still a change >> of behavior, because if attacker can allocate memory from listener's memcg, >> then attacker can force overflow and hide the traces of its own filesystem >> operations. >> > > ACK. > >>>> Something like FAN_GROUP_QUEUE (better name is welcome) >>>> which is mutually exclusive (?) with FAN_UNLIMITED_QUEUE. >>>> > > How about FAN_CHARGE_MEMCG? > Also should there be a similar flag for inotify_init1() as well? >>> >>> There is no need to make them mutually exclusive. One should be able >>> to request an unlimited queue limited by available memory on system >>> (with no kmem charging) or limited by limit of the listener's memcg >>> (with kmem charging). >> >> OK. >> >>> >>>> The question is, do we need the user to also explicitly opt-in for >>>> Q_OVERFLOW on ENOMEM with FAN_Q_ERR mark mask? >>>> Should these 2 new APIs be coupled or independent? >>>> >>> >>> Are there any error which are not related to queue overflows? I see >>> the mention of ENODEV and EOVERFLOW in the discussion. If there are >>> such errors and might be interesting to the listener then we should >>> have 2 independent APIs. >>> >> >> These are indeed 2 different use cases. >> A Q_OVERFLOW event is only expected one of ENOMEM or >> EOVERFLOW in event->fd, but other events (like open of special device >> file) can have ENODEV in event->fd. >> >> But I am not convinced that those require 2 independent APIs. >> Specifying FAN_Q_ERR means that the user expects to reads errors >> from event->fd. >> > > Can you please explain what you mean by 2 independent APIs? I thought > "no independent APIs" means FAN_Q_ERR can only be used with > FAN_Q_OVERFLOW and without FAN_Q_OVERFLOW, FAN_Q_ERR is ignored. Is > that right or I misunderstood? > >>>> Another question is whether FAN_GROUP_QUEUE may require >>>> less than CAP_SYS_ADMIN? Of course for now, this is only a >>>> semantic change, because fanotify_init() requires CAP_SYS_ADMIN >>>> but as the documentation suggests, this may be relaxed in the future. >>>> >>> >>> I think there is no need for imposing CAP_SYS_ADMIN for requesting to >>> charge self for the event memory. >>> >> >> Certainly. The question is whether the flag combination >> FAN_GROUP_QUEUE|FAN_UNLIMITED_QUEUE could relax the >> CAP_SYS_ADMIN requirement that is imposed by FAN_UNLIMITED_QUEUE >> by itself. >> > > Oh, I agree with relaxing CAP_SYS_ADMIN requirement if both flags are given. > >> Note that FAN_UNLIMITED_MARKS cannot relax CAP_SYS_ADMIN >> even though marks are already accounted to listener memcg. This is because >> most of the memory consumption in this case comes from marks pinning the >> watched inodes to cache and not from the marks themselves. >> > > thanks, > Shakeel