On Thu, Feb 22, 2018 at 6:48 AM, Jan Kara <jack@xxxxxxx> wrote: > On Thu 22-02-18 14:49:44, Michal Hocko wrote: >> On Tue 20-02-18 19:01:01, Shakeel Butt wrote: >> > A lot of memory can be consumed by the events generated for the huge or >> > unlimited queues if there is either no or slow listener. This can cause >> > system level memory pressure or OOMs. So, it's better to account the >> > fsnotify kmem caches to the memcg of the listener. >> >> How much memory are we talking about here? > > 32 bytes per event (on 64-bit) which is small but the number of events is > not limited in any way (if the creator uses a special flag and has > CAP_SYS_ADMIN). In the thread [1] a guy from Alibaba wanted this feature so > among cloud people there is apparently some demand to have a way to limit > memory usage of such application... > >> > There are seven fsnotify kmem caches and among them allocations from >> > dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and >> > inotify_inode_mark_cachep happens in the context of syscall from the >> > listener. So, SLAB_ACCOUNT is enough for these caches. >> > >> > The objects from fsnotify_mark_connector_cachep are not accounted as >> > they are small compared to the notification mark or events and it is >> > unclear whom to account connector to since it is shared by all events >> > attached to the inode. >> > >> > The allocations from the event caches happen in the context of the event >> > producer. For such caches we will need to remote charge the allocations >> > to the listener's memcg. Thus we save the memcg reference in the >> > fsnotify_group structure of the listener. >> >> Is it typical that the listener lives in a different memcg and if yes >> then cannot this cause one memcg to OOM/DoS the one with the listener? > > We have been through these discussions already in [1] back in November :). > I can understand the wish to limit memory usage of an application using > unlimited fanotify queues. And yes, it may mean that it will be easier for > an attacker to get it oom-killed (currently the malicious app would drive > the whole system oom which will presumably take a bit more effort as there > is more memory to consume). But then I expect this is what admin prefers > when he limits memory usage of fanotify listener. > Just one clarification, currently the kernel does not trigger oom-killer for allocations hitting memcg limit in the context of syscalls but rather return an ENOMEM (after trying memcg reclaim). Jan has already posted a patch to handle those ENOMEMs.