On Wed, Sep 11, 2019 at 8:32 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Tue, Sep 10, 2019 at 05:17:48PM +0200, Ilya Dryomov wrote: > > The vmalloc allocator doesn't fully respect the specified gfp mask: > > while the actual pages are allocated as requested, the page table pages > > are always allocated with GFP_KERNEL. ceph_kvmalloc() may be called > > with GFP_NOFS and GFP_NOIO (for ceph and rbd respectively), so this may > > result in a deadlock. > > > > There is no real reason for the current PAGE_ALLOC_COSTLY_ORDER logic, > > it's just something that seemed sensible at the time (ceph_kvmalloc() > > predates kvmalloc()). kvmalloc() is smarter: in an attempt to reduce > > long term fragmentation, it first tries to kmalloc non-disruptively. > > > > Switch to kvmalloc() and set the respective PF_MEMALLOC_* flag using > > the scope API to avoid the deadlock. Note that kvmalloc() needs to be > > passed GFP_KERNEL to enable the fallback. > > If you can please just stop using GFP_NOFS altogether and set > PF_MEMALLOC_* for the actual contexts. Hi Christoph, ceph_kvmalloc() is indirectly called from dozens of places, everywhere a new RPC message is allocated. Some of them are used for client setup and don't need a scope (GFP_KERNEL is fine), but the vast majority do. I don't think wrapping each call is practical. As for getting rid of GFP_NOFS and GFP_NOIO entirely (i.e. dropping the gfp mask from all libceph APIs and using scopes instead), it's something that I have had in the back of my head for a while now because we cheat in a few places and hard-code GFP_NOIO as the lowest common denominator instead of properly propagating the gfp mask. It's more of a project though, and won't be backportable. Thanks, Ilya