Re: [PATCH 0/3] Chunk Heap Support on DMA-HEAP

Cho KyongHo <pullip.cho@xxxxxxxxxxx> · Wed, 19 Aug 2020 12:46:26 +0900

On Tue, Aug 18, 2020 at 11:55:57AM +0100, Brian Starkey wrote:
> Hi,
> 
> On Tue, Aug 18, 2020 at 05:04:12PM +0900, Hyesoo Yu wrote:
> > These patch series to introduce a new dma heap, chunk heap.
> > That heap is needed for special HW that requires bulk allocation of
> > fixed high order pages. For example, 64MB dma-buf pages are made up
> > to fixed order-4 pages * 1024.
> > 
> > The chunk heap uses alloc_pages_bulk to allocate high order page.
> > https://lore.kernel.org/linux-mm/20200814173131.2803002-1-minchan@xxxxxxxxxx
> > 
> > The chunk heap is registered by device tree with alignment and memory node
> > of contiguous memory allocator(CMA). Alignment defines chunk page size.
> > For example, alignment 0x1_0000 means chunk page size is 64KB.
> > The phandle to memory node indicates contiguous memory allocator(CMA).
> > If device node doesn't have cma, the registration of chunk heap fails.
> 
> This reminds me of an ion heap developed at Arm several years ago:
> https://protect2.fireeye.com/v1/url?k=aceed8af-f122140a-acef53e0-0cc47a30d446-0980fa451deb2df6&q=1&e=a58a9bb0-a837-4fc5-970e-907089bfe25e&u=https%3A%2F%2Fgit.linaro.org%2Flanding-teams%2Fworking%2Farm%2Fkernel.git%2Ftree%2Fdrivers%2Fstaging%2Fandroid%2Fion%2Fion_compound_page.c
> 
> Some more descriptive text here:
> https://protect2.fireeye.com/v1/url?k=83dc3e8b-de10f22e-83ddb5c4-0cc47a30d446-a406aa201ca7dddc&q=1&e=a58a9bb0-a837-4fc5-970e-907089bfe25e&u=https%3A%2F%2Fgithub.com%2FARM-software%2FCPA
> 
> It maintains a pool of high-order pages with a worker thread to
> attempt compaction and allocation to keep the pool filled, with high
> and low watermarks to trigger freeing/allocating of chunks.
> It implements a shrinker to allow the system to reclaim the pool under
> high memory pressure.
> 
> Is maintaining a pool something you considered? From the
> alloc_pages_bulk thread it sounds like you want to allocate 300M at a
> time, so I expect if you tuned the pool size to match that it could
> work quite well.
> 
> That implementation isn't using a CMA region, but a similar approach
> could definitely be applied.
> 
I have seriously considered CPA in our product but we developed our own
because of the pool in CPA.
The high-order pages are required by some specific users like Netflix
app. Moreover required number of bytes are dramatically increasing
because of high resolution videos and displays in these days.

Gathering lots of free high-order pages in the background during
run-time means reserving that amount of pages from the entier available
system memory. Moreover the gathered pages are soon reclaimed whenever
the system is sufferring from memory pressure (i.e. camera recording,
heavy games). So we had to consider allocating hundreds of megabytes at
at time. Of course we don't allocate all buffers by a single call to
alloc_pages_bulk(). But still a buffer is very large.
A single frame of 8K HDR video needs 95MB (7680*4320*2*1.5). Even a
single frame of HDR 4K video needs 24MB and 4K HDR is now popular in
Netflix, YouTube and Google Play video.

> Thanks,
> -Brian

Thank you!

KyongHo