On Thu 09-04-20 22:17:20, colyli@xxxxxxx wrote: > From: Coly Li <colyli@xxxxxxx> > > In raid5.c:resize_chunk(), scribble_alloc() is called with GFP_NOIO > flag, then it is sent into kvmalloc_array() inside scribble_alloc(). > > The problem is kvmalloc_array() eventually calls kvmalloc_node() which > does not accept non GFP_KERNEL compatible flag like GFP_NOIO, then > kmalloc_node() is called indeed to allocate physically continuous > pages. When system memory is under heavy pressure, and the requesting > size is large, there is high probability that allocating continueous > pages will fail. > > But simply using GFP_KERNEL flag to call kvmalloc_array() is also > progblematic. In the code path where scribble_alloc() is called, the > raid array is suspended, if kvmalloc_node() triggers memory reclaim I/Os > and such I/Os go back to the suspend raid array, deadlock will happen. > > What is desired here is to allocate non-physically (a.k.a virtually) > continuous pages and avoid memory reclaim I/Os. Michal Hocko suggests > to use the mmealloc sceope APIs to restrict memory reclaim I/O in > allocating context, specifically to call memalloc_noio_save() when > suspend the raid array and to call memalloc_noio_restore() when > resume the raid array. > > This patch adds the memalloc scope APIs in mddev_suspend() and > mddev_resume(), to restrict memory reclaim I/Os during the raid array > is suspended. The benifit of adding the memalloc scope API in the > unified entry point mddev_suspend()/mddev_resume() is, no matter which > md raid array type (personality), we are sure the deadlock by recursive > memory reclaim I/O won't happen on the suspending context. I am not familiar with the mdraid code so I cannot really judge the correctness here but if mddev_suspend really acts as a potential reclaim recursion deadlock entry then this is the right way to use the API. Essentially all the allocations in that scope will have an implicit NOIO semantic. Thing to be careful about is the make sure that mddev_suspend cannot be nested. And also that there are no callers of scribble_alloc outside of mddev_suspend scope which would be reclaim deadlock prone. If they are their scope should be handled in the similar way. Thanks! -- Michal Hocko SUSE Labs