Re: [RFC PATCH] docs/core-api: add memory allocation guide

Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> · Wed, 15 Aug 2018 12:04:29 +0300

On Wed, Aug 15, 2018 at 10:15:39AM +0200, Michal Hocko wrote:
> On Wed 15-08-18 09:36:49, Mike Rapoport wrote:
> > (this time with the subject, sorry for the noise)
> > 
> > On Wed, Aug 15, 2018 at 09:34:47AM +0300, Mike Rapoport wrote:
> > > As Vlastimil mentioned at [1], it would be nice to have some guide about
> > > memory allocation. I've drafted an initial version that tries to summarize
> > > "best practices" for allocation functions and GFP usage.
> > > 
> > > [1] https://www.spinics.net/lists/netfilter-devel/msg55542.html
> > > 
> > > From 8027c0d4b750b8dbd687234feda63305d0d5a057 Mon Sep 17 00:00:00 2001
> > > From: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>
> > > Date: Wed, 15 Aug 2018 09:10:06 +0300
> > > Subject: [RFC PATCH] docs/core-api: add memory allocation guide
> > > 
> > > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>
> > > ---
> > >  Documentation/core-api/gfp_mask-from-fs-io.rst |   2 +
> > >  Documentation/core-api/index.rst               |   1 +
> > >  Documentation/core-api/memory-allocation.rst   | 117 +++++++++++++++++++++++++
> > >  Documentation/core-api/mm-api.rst              |   2 +
> > >  4 files changed, 122 insertions(+)
> > >  create mode 100644 Documentation/core-api/memory-allocation.rst
> > > 
> > > diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst
> > > index e0df8f4..e7c32a8 100644
> > > --- a/Documentation/core-api/gfp_mask-from-fs-io.rst
> > > +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst
> > > @@ -1,3 +1,5 @@
> > > +.. _gfp_mask_from_fs_io:
> > > +
> > >  =================================
> > >  GFP masks used from FS/IO context
> > >  =================================
> > > diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> > > index cdc2020..8afc0da 100644
> > > --- a/Documentation/core-api/index.rst
> > > +++ b/Documentation/core-api/index.rst
> > > @@ -27,6 +27,7 @@ Core utilities
> > >     errseq
> > >     printk-formats
> > >     circular-buffers
> > > +   memory-allocation
> > >     mm-api
> > >     gfp_mask-from-fs-io
> > >     timekeeping
> > > diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
> > > new file mode 100644
> > > index 0000000..b1f2ad5
> > > --- /dev/null
> > > +++ b/Documentation/core-api/memory-allocation.rst
> > > @@ -0,0 +1,117 @@
> > > +=======================
> > > +Memory Allocation Guide
> > > +=======================
> > > +
> > > +Linux supplies variety of APIs for memory allocation. You can allocate
> > > +small chunks using `kmalloc` or `kmem_cache_alloc` families, large
> > > +virtually contiguous areas using `vmalloc` and it's derivatives, or
> > > +you can directly request pages from the page allocator with
> > > +`__get_free_pages`. It is also possible to use more specialized
> 
> I would rather not mention __get_free_pages. alloc_pages is a more
> generic API and less subtle one. If you want to mention __get_free_pages
> then please make sure to mention the subtlety (namely that is can
> allocate only lowmem memory).
> 
> > > +allocators, for instance `cma_alloc` or `zs_malloc`.
> > > +
> > > +Most of the memory allocations APIs use GFP flags to express how that
> > > +memory should be allocated. The GFP acronym stands for "get free
> > > +pages", the underlying memory allocation function.
> > > +
> > > +Diversity of the allocation APIs combined with the numerous GFP flags
> > > +makes the question "How should I allocate memory?" not that easy to
> > > +answer, although very likely you should use
> > > +
> > > +::
> > > +
> > > +  kzalloc(<size>, GFP_KERNEL);
> > > +
> > > +Of course there are cases when other allocation APIs and different GFP
> > > +flags must be used.
> > > +
> > > +Get Free Page flags
> > > +===================
> > > +
> > > +The GFP flags control the allocators behavior. They tell what memory
> > > +zones can be used, how hard the allocator should try to find a free
> > > +memory, whether the memory can be accessed by the userspace etc. The
> > > +:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides
> > > +reference documentation for the GFP flags and their combinations and
> > > +here we briefly outline their recommended usage:
> > > +
> > > +  * Most of the times ``GFP_KERNEL`` is what you need. Memory for the
> > > +    kernel data structures, DMAable memory, inode cache, all these and
> > > +    many other allocations types can use ``GFP_KERNEL``. Note, that
> > > +    using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that
> > > +    direct reclaim may be triggered under memory pressure; the calling
> > > +    context must be allowed to sleep.
> > > +  * If the allocation is performed from an atomic context, e.g
> > > +    interrupt handler, use ``GFP_ATOMIC``.
> 
> GFP_NOWAIT please. GFP_ATOMIC should be only used if accessing memory
> reserves is justified. E.g. fallback allocation would be too costly. It
> should be also noted that these allocation are quite likely to fail
> especially under memory pressure.

How about:

* If the allocation is performed from an atomic context, e.g interrupt
  handler, use ``GFP_NOWARN``. This flag prevents direct reclaim and IO or
  filesystem operations. Consequently, under memory pressure ``GFP_NOWARN``
  allocation is likely to fail.
* If you think that accessing memory reserves is justified and the kernel
  will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.

> > > +  * Untrusted allocations triggered from userspace should be a subject
> > > +    of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There
> > > +    is handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL``
> > > +    allocations that should be accounted.
> > > +  * Userspace allocations should use either of the ``GFP_USER``,
> > > +    ``GFP_HIGHUSER`` and ``GFP_HIGHUSER_MOVABLE`` flags. The longer
> > > +    the flag name the less restrictive it is.
> > > +
> > > +    The ``GFP_HIGHUSER_MOVABLE`` does not require that allocated
> > > +    memory will be directly accessible by the kernel or the hardware
> > > +    and implies that the data may move.
> 
> @may move@is movable@

Ok

> > > +    The ``GFP_HIGHUSER`` means that the allocated memory is not
> > > +    movable, but it is not required to be directly accessible by the
> > > +    kernel or the hardware. An example may be a hardware allocation
> > > +    that maps data directly into userspace but has no addressing
> > > +    limitations.
> > > +
> > > +    The ``GFP_USER`` means that the allocated memory is not movable
> > > +    and it must be directly accessible by the kernel or the
> > > +    hardware. It is typically used by hardware for buffers that are
> > > +    mapped to userspace (e.g. graphics) that hardware still must DMA
> > > +    to.
> > > +
> > > +You may notice that quite a few allocations in the existing code
> > > +specify ``GFP_NOIO`` and ``GFP_NOFS``. Historically, they were used to
> > > +prevent recursion deadlocks caused by direct memory reclaim calling
> > > +back into the FS or IO paths and blocking on already held
> > > +resources. Since 4.12 the preferred way to address this issue is to
> > > +use new scope APIs described in
> > > +:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
> > > +
> > > +Another legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are
> > > +used to ensure that the allocated memory is accessible by hardware
> > > +with limited addressing capabilities. So unless you are writing a
> > > +driver for a device with such restrictions, avoid using these flags.
> 
> And even with HW with restrictions it is preferable to use dma_alloc*
> APIs

Will add.

> Looks nice otherwise. Thanks! With the above changes feel free to add
> Acked-by: Michal Hocko <mhocko@xxxxxxxx>

Thanks!

> -- 
> Michal Hocko
> SUSE Labs
> 

-- 
Sincerely yours,
Mike.