Re: [PATCH 1/2] mm: cma: fix allocation may fail sometimes

David Hildenbrand <david@xxxxxxxxxx> · Thu, 16 Dec 2021 11:56:59 +0100

On 16.12.21 03:54, Aisheng Dong wrote:
>> From: David Hildenbrand <david@xxxxxxxxxx>
>> Sent: Wednesday, December 15, 2021 8:31 PM
>>
>> On 15.12.21 09:02, Dong Aisheng wrote:
>>> We met dma_alloc_coherent() fail sometimes when doing 8 VPU decoder
>>> test in parallel on a MX6Q SDB board.
>>>
>>> Error log:
>>> cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
>>> cma: number of available pages:
>>>
>> 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@3607
>> 6+99@40477+108
>>> @40852+44@41108+20@41196+108@41364+108@41620+
>>>
>> 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49
>> 324+20@49388+
>>> 5076@49452+2304@55040+35@58141+20@58220+20@58284+
>>> 7188@58348+84@66220+7276@66452+227@74525+6371@75549=>
>> 33161 free of
>>> 81920 total pages
>>>
>>> When issue happened, we saw there were still 33161 pages (129M) free
>>> CMA memory and a lot available free slots for 148 pages in CMA bitmap
>>> that we want to allocate.
>>>
>>> If dumping memory info, we found that there was also ~342M normal
>>> memory, but only 1352K CMA memory left in buddy system while a lot of
>>> pageblocks were isolated.
>>>
>>> Memory info log:
>>> Normal free:351096kB min:30000kB low:37500kB high:45000kB
>> reserved_highatomic:0KB
>>> 	    active_anon:98060kB inactive_anon:98948kB active_file:60864kB
>> inactive_file:31776kB
>>> 	    unevictable:0kB writepending:0kB present:1048576kB
>> managed:1018328kB mlocked:0kB
>>> 	    bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB
>>> lowmem_reserve[]: 0 0 0
>>> Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB
>> (UMECI) 65*64kB (UMCI)
>>> 	36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI)
>> 4*2048kB (MI) 8*4096kB (EI)
>>> 	8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
>>>
>>> The root cause of this issue is that since commit a4efc174b382
>>> ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports
>> concurrent
>>> memory allocation. It's possible that the pageblock process A try to
>>> alloc has already been isolated by the allocation of process B during
>>> memory migration.
>>>
>>> When there're multi process allocating CMA memory in parallel, it's
>>> likely that other the remain pageblocks may have also been isolated,
>>> then CMA alloc fail finally during the first round of scanning of the
>>> whole available CMA bitmap.
>>
>> I already raised in different context that we should most probably convert that
>> -EBUSY to -EAGAIN --  to differentiate an actual migration problem from a
>> simple "concurrent allocations that target the same MAX_ORDER -1 range".
>>
> 
> Thanks for the info. Is there a patch under review?

No, and I was too busy for now to send it out.

> BTW i wonder that probably makes no much difference for my patch since we may
> prefer retry the next pageblock rather than busy waiting on the same isolated pageblock.

Makes sense. BUT as of now we isolate not only a pageblock but a
MAX_ORDER -1 page (e.g., 2 pageblocks on x86-64 (!) ). So you'll have
the same issue in that case.

-- 
Thanks,

David / dhildenb