On Wed, Aug 15, 2018 at 10:15:39AM +0200, Michal Hocko wrote: > On Wed 15-08-18 09:36:49, Mike Rapoport wrote: > > (this time with the subject, sorry for the noise) > > > > On Wed, Aug 15, 2018 at 09:34:47AM +0300, Mike Rapoport wrote: > > > As Vlastimil mentioned at [1], it would be nice to have some guide about > > > memory allocation. I've drafted an initial version that tries to summarize > > > "best practices" for allocation functions and GFP usage. > > > > > > [1] https://www.spinics.net/lists/netfilter-devel/msg55542.html > > > > > > From 8027c0d4b750b8dbd687234feda63305d0d5a057 Mon Sep 17 00:00:00 2001 > > > From: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > > > Date: Wed, 15 Aug 2018 09:10:06 +0300 > > > Subject: [RFC PATCH] docs/core-api: add memory allocation guide > > > > > > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > > > --- > > > Documentation/core-api/gfp_mask-from-fs-io.rst | 2 + > > > Documentation/core-api/index.rst | 1 + > > > Documentation/core-api/memory-allocation.rst | 117 +++++++++++++++++++++++++ > > > Documentation/core-api/mm-api.rst | 2 + > > > 4 files changed, 122 insertions(+) > > > create mode 100644 Documentation/core-api/memory-allocation.rst > > > > > > diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst > > > index e0df8f4..e7c32a8 100644 > > > --- a/Documentation/core-api/gfp_mask-from-fs-io.rst > > > +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst > > > @@ -1,3 +1,5 @@ > > > +.. _gfp_mask_from_fs_io: > > > + > > > ================================= > > > GFP masks used from FS/IO context > > > ================================= > > > diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst > > > index cdc2020..8afc0da 100644 > > > --- a/Documentation/core-api/index.rst > > > +++ b/Documentation/core-api/index.rst > > > @@ -27,6 +27,7 @@ Core utilities > > > errseq > > > printk-formats > > > circular-buffers > > > + memory-allocation > > > mm-api > > > gfp_mask-from-fs-io > > > timekeeping > > > diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst > > > new file mode 100644 > > > index 0000000..b1f2ad5 > > > --- /dev/null > > > +++ b/Documentation/core-api/memory-allocation.rst > > > @@ -0,0 +1,117 @@ > > > +======================= > > > +Memory Allocation Guide > > > +======================= > > > + > > > +Linux supplies variety of APIs for memory allocation. You can allocate > > > +small chunks using `kmalloc` or `kmem_cache_alloc` families, large > > > +virtually contiguous areas using `vmalloc` and it's derivatives, or > > > +you can directly request pages from the page allocator with > > > +`__get_free_pages`. It is also possible to use more specialized > > I would rather not mention __get_free_pages. alloc_pages is a more > generic API and less subtle one. If you want to mention __get_free_pages > then please make sure to mention the subtlety (namely that is can > allocate only lowmem memory). > > > > +allocators, for instance `cma_alloc` or `zs_malloc`. > > > + > > > +Most of the memory allocations APIs use GFP flags to express how that > > > +memory should be allocated. The GFP acronym stands for "get free > > > +pages", the underlying memory allocation function. > > > + > > > +Diversity of the allocation APIs combined with the numerous GFP flags > > > +makes the question "How should I allocate memory?" not that easy to > > > +answer, although very likely you should use > > > + > > > +:: > > > + > > > + kzalloc(<size>, GFP_KERNEL); > > > + > > > +Of course there are cases when other allocation APIs and different GFP > > > +flags must be used. > > > + > > > +Get Free Page flags > > > +=================== > > > + > > > +The GFP flags control the allocators behavior. They tell what memory > > > +zones can be used, how hard the allocator should try to find a free > > > +memory, whether the memory can be accessed by the userspace etc. The > > > +:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides > > > +reference documentation for the GFP flags and their combinations and > > > +here we briefly outline their recommended usage: > > > + > > > + * Most of the times ``GFP_KERNEL`` is what you need. Memory for the > > > + kernel data structures, DMAable memory, inode cache, all these and > > > + many other allocations types can use ``GFP_KERNEL``. Note, that > > > + using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that > > > + direct reclaim may be triggered under memory pressure; the calling > > > + context must be allowed to sleep. > > > + * If the allocation is performed from an atomic context, e.g > > > + interrupt handler, use ``GFP_ATOMIC``. > > GFP_NOWAIT please. GFP_ATOMIC should be only used if accessing memory > reserves is justified. E.g. fallback allocation would be too costly. It > should be also noted that these allocation are quite likely to fail > especially under memory pressure. How about: * If the allocation is performed from an atomic context, e.g interrupt handler, use ``GFP_NOWARN``. This flag prevents direct reclaim and IO or filesystem operations. Consequently, under memory pressure ``GFP_NOWARN`` allocation is likely to fail. * If you think that accessing memory reserves is justified and the kernel will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``. > > > + * Untrusted allocations triggered from userspace should be a subject > > > + of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There > > > + is handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` > > > + allocations that should be accounted. > > > + * Userspace allocations should use either of the ``GFP_USER``, > > > + ``GFP_HIGHUSER`` and ``GFP_HIGHUSER_MOVABLE`` flags. The longer > > > + the flag name the less restrictive it is. > > > + > > > + The ``GFP_HIGHUSER_MOVABLE`` does not require that allocated > > > + memory will be directly accessible by the kernel or the hardware > > > + and implies that the data may move. > > @may move@is movable@ Ok > > > + The ``GFP_HIGHUSER`` means that the allocated memory is not > > > + movable, but it is not required to be directly accessible by the > > > + kernel or the hardware. An example may be a hardware allocation > > > + that maps data directly into userspace but has no addressing > > > + limitations. > > > + > > > + The ``GFP_USER`` means that the allocated memory is not movable > > > + and it must be directly accessible by the kernel or the > > > + hardware. It is typically used by hardware for buffers that are > > > + mapped to userspace (e.g. graphics) that hardware still must DMA > > > + to. > > > + > > > +You may notice that quite a few allocations in the existing code > > > +specify ``GFP_NOIO`` and ``GFP_NOFS``. Historically, they were used to > > > +prevent recursion deadlocks caused by direct memory reclaim calling > > > +back into the FS or IO paths and blocking on already held > > > +resources. Since 4.12 the preferred way to address this issue is to > > > +use new scope APIs described in > > > +:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`. > > > + > > > +Another legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are > > > +used to ensure that the allocated memory is accessible by hardware > > > +with limited addressing capabilities. So unless you are writing a > > > +driver for a device with such restrictions, avoid using these flags. > > And even with HW with restrictions it is preferable to use dma_alloc* > APIs Will add. > Looks nice otherwise. Thanks! With the above changes feel free to add > Acked-by: Michal Hocko <mhocko@xxxxxxxx> Thanks! > -- > Michal Hocko > SUSE Labs > -- Sincerely yours, Mike.