On Tue, May 14, 2024 at 06:19:53PM +0000, Haakon Bugge wrote: > Hi Jason, > > > > On 14 May 2024, at 01:03, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Mon, May 13, 2024 at 02:53:40PM +0200, Håkon Bugge wrote: > >> This series enables RDS and the RDMA stack to be used as a block I/O > >> device. This to support a filesystem on top of a raw block device > >> which uses RDS and the RDMA stack as the network transport layer. > >> > >> Under intense memory pressure, we get memory reclaims. Assume the > >> filesystem reclaims memory, goes to the raw block device, which calls > >> into RDS, which calls the RDMA stack. Now, if regular GFP_KERNEL > >> allocations in RDS or the RDMA stack require reclaims to be fulfilled, > >> we end up in a circular dependency. > >> > >> We break this circular dependency by: > >> > >> 1. Force all allocations in RDS and the relevant RDMA stack to use > >> GFP_NOIO, by means of a parenthetic use of > >> memalloc_noio_{save,restore} on all relevant entry points. > > > > I didn't see an obvious explanation why each of these changes was > > necessary. I expected this: > > > >> 2. Make sure work-queues inherits current->flags > >> wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the > >> work-queue inherits the same flag(s). > > When the modules initialize, it does not help to have 2., unless > PF_MEMALLOC_NOIO is set in current->flags. That is most probably not > set, e.g. considering modprobe. That is why we have these steps in > all the five modules. During module initialization, work queues are > allocated in all mentioned modules. Therefore, the module > initialization functions need the paranthetic use of > memalloc_noio_{save,restore}. And why would I need these work queues to have noio? they are never called under a filesystem. You need to explain in every single case how something in a NOIO context becomes entangled with the unrelated thing you are taggin NIO. Historically when we've tried to do this we gave up because the entire subsystem end up being NOIO. > > And further, is there any validation of this? There is some lockdep > > tracking of reclaim, I feel like it should be more robustly hooked up > > in RDMA if we expect this to really work.. > > Oracle is about to launch a product using this series, so the > techniques used have been thoroughly validated, allthough on an > older kernel version. That doesn't really help keep it working. I want to see some kind of lockdep scheme to enforce this that can validate without ever triggering reclaim. Jason