The patch titled Subject: memcg: make memcg->event_list_lock irqsafe has been added to the -mm tree. Its filename is memcg-make-memcg-event_list_lock-irqsafe.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/memcg-make-memcg-event_list_lock-irqsafe.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/memcg-make-memcg-event_list_lock-irqsafe.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Shakeel Butt <shakeelb@xxxxxxxxxx> Subject: memcg: make memcg->event_list_lock irqsafe The memcg->event_list_lock is usually taken in the normal context but when the userspace closes the corresponding eventfd, eventfd_release through memcg_event_wake takes memcg->event_list_lock with interrupts disabled. This is not an issue on its own but it creates a nested dependency from eventfd_ctx->wqh.lock to memcg->event_list_lock. Independently, for unrelated eventfd, eventfd_signal() can be called in the irq context, thus making eventfd_ctx->wqh.lock an irq lock. For example, FPGA DFL driver, VHOST VPDA driver and couple of VFIO drivers. This will force memcg->event_list_lock to be an irqsafe lock as well. One way to break the nested dependency between eventfd_ctx->wqh.lock and memcg->event_list_lock is to add an indirection. However the simplest solution would be to make memcg->event_list_lock irqsafe. This is cgroup v1 feature, is in maintenance and may get deprecated in near future. So, no need to add more code. BTW this has been discussed previously [1] but there weren't irq users of eventfd_signal() at the time. [1] https://www.spinics.net/lists/cgroups/msg06248.html Link: https://lkml.kernel.org/r/20210830172953.207257-1-shakeelb@xxxxxxxxxx Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memcontrol.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/mm/memcontrol.c~memcg-make-memcg-event_list_lock-irqsafe +++ a/mm/memcontrol.c @@ -4839,9 +4839,9 @@ static ssize_t memcg_write_event_control vfs_poll(efile.file, &event->pt); - spin_lock(&memcg->event_list_lock); + spin_lock_irq(&memcg->event_list_lock); list_add(&event->list, &memcg->event_list); - spin_unlock(&memcg->event_list_lock); + spin_unlock_irq(&memcg->event_list_lock); fdput(cfile); fdput(efile); @@ -5268,12 +5268,12 @@ static void mem_cgroup_css_offline(struc * Notify userspace about cgroup removing only after rmdir of cgroup * directory to avoid race between userspace and kernelspace. */ - spin_lock(&memcg->event_list_lock); + spin_lock_irq(&memcg->event_list_lock); list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { list_del_init(&event->list); schedule_work(&event->remove); } - spin_unlock(&memcg->event_list_lock); + spin_unlock_irq(&memcg->event_list_lock); page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); _ Patches currently in -mm which might be from shakeelb@xxxxxxxxxx are writeback-memcg-simplify-cgroup_writeback_by_id.patch memcg-switch-lruvec-stats-to-rstat.patch memcg-infrastructure-to-flush-memcg-stats.patch memcg-infrastructure-to-flush-memcg-stats-v5.patch memcg-cleanup-racy-sum-avoidance-code.patch memcg-make-memcg-event_list_lock-irqsafe.patch