Re: [PATCH v3 1/4] md: use memalloc scope APIs in mddev_suspend()/mddev_resume()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 09-04-20 22:17:20, colyli@xxxxxxx wrote:
> From: Coly Li <colyli@xxxxxxx>
> 
> In raid5.c:resize_chunk(), scribble_alloc() is called with GFP_NOIO
> flag, then it is sent into kvmalloc_array() inside scribble_alloc().
> 
> The problem is kvmalloc_array() eventually calls kvmalloc_node() which
> does not accept non GFP_KERNEL compatible flag like GFP_NOIO, then
> kmalloc_node() is called indeed to allocate physically continuous
> pages. When system memory is under heavy pressure, and the requesting
> size is large, there is high probability that allocating continueous
> pages will fail.
> 
> But simply using GFP_KERNEL flag to call kvmalloc_array() is also
> progblematic. In the code path where scribble_alloc() is called, the
> raid array is suspended, if kvmalloc_node() triggers memory reclaim I/Os
> and such I/Os go back to the suspend raid array, deadlock will happen.
> 
> What is desired here is to allocate non-physically (a.k.a virtually)
> continuous pages and avoid memory reclaim I/Os. Michal Hocko suggests
> to use the mmealloc sceope APIs to restrict memory reclaim I/O in
> allocating context, specifically to call memalloc_noio_save() when
> suspend the raid array and to call memalloc_noio_restore() when
> resume the raid array.
> 
> This patch adds the memalloc scope APIs in mddev_suspend() and
> mddev_resume(), to restrict memory reclaim I/Os during the raid array
> is suspended. The benifit of adding the memalloc scope API in the
> unified entry point mddev_suspend()/mddev_resume() is, no matter which
> md raid array type (personality), we are sure the deadlock by recursive
> memory reclaim I/O won't happen on the suspending context.

I am not familiar with the mdraid code so I cannot really judge the
correctness here but if mddev_suspend really acts as a potential reclaim
recursion deadlock entry then this is the right way to use the API.
Essentially all the allocations in that scope will have an implicit NOIO
semantic.

Thing to be careful about is the make sure that mddev_suspend cannot
be nested. And also that there are no callers of scribble_alloc outside
of mddev_suspend scope which would be reclaim deadlock prone. If they
are their scope should be handled in the similar way.

Thanks!
-- 
Michal Hocko
SUSE Labs



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux