On Tue, Feb 13, 2018 at 1:54 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > On Tue, Feb 13, 2018 at 11:10 PM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: >> On Mon, Feb 12, 2018 at 10:30 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>> On Thu, Jan 25, 2018 at 10:36 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>>> On Thu, Jan 25, 2018 at 10:20 PM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: >>>>> On Wed, Jan 24, 2018 at 11:51 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>>>>> >>>>>> There is a nicer alternative, instead of failing the file access, >>>>>> an overflow event can be queued. I sent a patch for that and Jan >>>>>> agreed to the concept, but thought we should let user opt-in for this >>>>>> change: >>>>>> https://marc.info/?l=linux-fsdevel&m=150944704716447&w=2 >>>>>> >>>>>> So IMO, if user opts-in for OVERFLOW instead of ENOMEM, >>>>>> charging the listener memcg would be non controversial. >>>>>> Otherwise, I cannot say that starting to charge the listener memgc >>>>>> for events won't break any application. >>>>>> >>>>> >>> >>> Shakeel, Jan, >>> >>> Reviving this thread and adding linux-api, because I think it is important to >>> agree on the API before patches. >>> >>> The last message on the thread you referenced suggest an API change >>> for opting in for Q_OVERFLOW on ENOMEM: >>> https://marc.info/?l=linux-api&m=150946878623441&w=2 >>> >>> However, the suggested API change in in fanotify_mark() syscall and >>> this is not the time when fsnotify_group is initialized. >>> I believe for opting-in to accounting events for listener, you >>> will need to add an opt-in flag for the fanotify_init() syscall. >>> >> >> I thought the reason to opt-in "charge memory to listener" was the >> risk of oom-killing the listener but it is now clear that there will >> be no oom-kills on memcg hitting its limit (no oom-killing listener >> risk). In my (not so strong) opinion we should only opt-in for >> receiving the {FAN|IN}_Q_OVERFLOW event on ENOMEM but always charge >> the memory for events to the listener's memcg if kmem accounting is >> enabled. >> > > I agree that charging listener's memcg is preferred, but it is still a change > of behavior, because if attacker can allocate memory from listener's memcg, > then attacker can force overflow and hide the traces of its own filesystem > operations. > ACK. >>> Something like FAN_GROUP_QUEUE (better name is welcome) >>> which is mutually exclusive (?) with FAN_UNLIMITED_QUEUE. >>> How about FAN_CHARGE_MEMCG? >> >> There is no need to make them mutually exclusive. One should be able >> to request an unlimited queue limited by available memory on system >> (with no kmem charging) or limited by limit of the listener's memcg >> (with kmem charging). > > OK. > >> >>> The question is, do we need the user to also explicitly opt-in for >>> Q_OVERFLOW on ENOMEM with FAN_Q_ERR mark mask? >>> Should these 2 new APIs be coupled or independent? >>> >> >> Are there any error which are not related to queue overflows? I see >> the mention of ENODEV and EOVERFLOW in the discussion. If there are >> such errors and might be interesting to the listener then we should >> have 2 independent APIs. >> > > These are indeed 2 different use cases. > A Q_OVERFLOW event is only expected one of ENOMEM or > EOVERFLOW in event->fd, but other events (like open of special device > file) can have ENODEV in event->fd. > > But I am not convinced that those require 2 independent APIs. > Specifying FAN_Q_ERR means that the user expects to reads errors > from event->fd. > Can you please explain what you mean by 2 independent APIs? I thought "no independent APIs" means FAN_Q_ERR can only be used with FAN_Q_OVERFLOW and without FAN_Q_OVERFLOW, FAN_Q_ERR is ignored. Is that right or I misunderstood? >>> Another question is whether FAN_GROUP_QUEUE may require >>> less than CAP_SYS_ADMIN? Of course for now, this is only a >>> semantic change, because fanotify_init() requires CAP_SYS_ADMIN >>> but as the documentation suggests, this may be relaxed in the future. >>> >> >> I think there is no need for imposing CAP_SYS_ADMIN for requesting to >> charge self for the event memory. >> > > Certainly. The question is whether the flag combination > FAN_GROUP_QUEUE|FAN_UNLIMITED_QUEUE could relax the > CAP_SYS_ADMIN requirement that is imposed by FAN_UNLIMITED_QUEUE > by itself. > Oh, I agree with relaxing CAP_SYS_ADMIN requirement if both flags are given. > Note that FAN_UNLIMITED_MARKS cannot relax CAP_SYS_ADMIN > even though marks are already accounted to listener memcg. This is because > most of the memory consumption in this case comes from marks pinning the > watched inodes to cache and not from the marks themselves. > thanks, Shakeel