Re: [RFC 0/7] Support high-order page bulk allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17.08.20 17:27, Minchan Kim wrote:
> On Sun, Aug 16, 2020 at 02:31:22PM +0200, David Hildenbrand wrote:
>> On 14.08.20 19:31, Minchan Kim wrote:
>>> There is a need for special HW to require bulk allocation of
>>> high-order pages. For example, 4800 * order-4 pages.
>>>
>>> To meet the requirement, a option is using CMA area because
>>> page allocator with compaction under memory pressure is
>>> easily failed to meet the requirement and too slow for 4800
>>> times. However, CMA has also the following drawbacks:
>>>
>>>  * 4800 of order-4 * cma_alloc is too slow
>>>
>>> To avoid the slowness, we could try to allocate 300M contiguous
>>> memory once and then split them into order-4 chunks.
>>> The problem of this approach is CMA allocation fails one of the
>>> pages in those range couldn't migrate out, which happens easily
>>> with fs write under memory pressure.
>>
>> Why not chose a value in between? Like try to allocate MAX_ORDER - 1
>> chunks and split them. That would already heavily reduce the call frequency.
> 
> I think you meant this:
> 
>     alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER - 1)
> 
> It would work if system has lots of non-fragmented free memory.
> However, once they are fragmented, it doesn't work. That's why we have
> seen even order-4 allocation failure in the field easily and that's why
> CMA was there.
> 
> CMA has more logics to isolate the memory during allocation/freeing as
> well as fragmentation avoidance so that it has less chance to be stealed
> from others and increase high success ratio. That's why I want this API
> to be used with CMA or movable zone.

I was talking about doing MAX_ORDER - 1 CMA allocations instead of one
big 300M allocation. As you correctly note, memory placed into CMA
should be movable, except for (short/long) term pinnings. In these
cases, doing allocations smaller than 300M and splitting them up should
be good enough to reduce the call frequency, no?

> 
> A usecase is device can set a exclusive CMA area up when system boots.
> When device needs 4800 * order-4 pages, it could call this bulk against
> of the area so that it could effectively be guaranteed to allocate
> enough fast.

Just wondering

a) Why does it have to be fast?
b) Why does it need that many order-4 pages?
c) How dynamic is the device need at runtime?
d) Would it be reasonable in your setup to mark a CMA region in a way
such that it will never be used for other (movable) allocations,
guaranteeing that you can immediately allocate it? Something like,
reserving a region during boot you know you'll immediately need later
completely for a device?


-- 
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux