On Tue, 2020-05-05 at 13:54 +0200, SeongJae Park wrote: > CC-ing stable@xxxxxxxxxxxxxxx and adding some more explanations. > > On Tue, 5 May 2020 10:10:33 +0200 SeongJae Park <sjpark@xxxxxxxxxx> > wrote: > > > > > From: SeongJae Park <sjpark@xxxxxxxxx> > > > > The commit 6d7855c54e1e ("sockfs: switch to ->free_inode()") made > > the > > deallocation of 'socket_alloc' to be done asynchronously using RCU, > > as > > same to 'sock.wq'. And the following commit 333f7909a857 > > ("coallocate > > socket_sq with socket itself") made those to have same life cycle. > > > > The changes made the code much more simple, but also made > > 'socket_alloc' > > live longer than before. For the reason, user programs intensively > > repeating allocations and deallocations of sockets could cause > > memory > > pressure on recent kernels. > I found this problem on a production virtual machine utilizing 4GB > memory while > running lebench[1]. The 'poll big' test of lebench opens 1000 > sockets, polls > and closes those. This test is repeated 10,000 times. Therefore it > should > consume only 1000 'socket_alloc' objects at once. As size of > socket_alloc is > about 800 Bytes, it's only 800 KiB. However, on the recent kernels, > it could > consume up to 10,000,000 objects (about 8 GiB). On the test machine, > I > confirmed it consuming about 4GB of the system memory and results in > OOM. > > [1] https://github.com/LinuxPerfStudy/LEBench > > > > > > > To avoid the problem, this commit reverts the changes. > I also tried to make fixup rather than reverts, but I couldn't easily > find > simple fixup. As the commits 6d7855c54e1e and 333f7909a857 were for > code > refactoring rather than performance optimization, I thought > introducing complex > fixup for this problem would make no sense. Meanwhile, the memory > pressure > regression could affect real machines. To this end, I decided to > quickly > revert the commits first and consider better refactoring later. > While lebench might be exercising a rather pathological case, the increase in memory pressure is real. I am concerned that the OOM killer is actually engaging and killing off processes when there are lots of resources already marked for release. This might be true for other lazy/delayed resource deallocation, too. This has obviously just become too lazy currently. So for both reverts: Reviewed-by: Stefan Nuernberger <snu@xxxxxxxxxx> > > Thanks, > SeongJae Park > > > > > > > SeongJae Park (2): > > Revert "coallocate socket_wq with socket itself" > > Revert "sockfs: switch to ->free_inode()" > > > > drivers/net/tap.c | 5 +++-- > > drivers/net/tun.c | 8 +++++--- > > include/linux/if_tap.h | 1 + > > include/linux/net.h | 4 ++-- > > include/net/sock.h | 4 ++-- > > net/core/sock.c | 2 +- > > net/socket.c | 23 ++++++++++++++++------- > > 7 files changed, 30 insertions(+), 17 deletions(-) > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879