On 08/04/22 at 02:01pm, Michal Hocko wrote: ...snip... > > > > > > > Thinking about it. We should get a warning when the actual allocation > > > > > > > from the pool fails no? That would be more useful information than the > > > > > > > pre-allocation failure when it is not really clear whether anybody is > > > > > > > ever going to consume it. > > > > > > > > > > > > Hi Michal, > > > > > > > > > > > > You haven't told on which ARCH you met this issue, is it x86_64? > > > > > > > > > > yes x86_64, so a small 16MB DMA zone. > > > > > > > > Yeah, the 16M DMA zone is redicilous and exists only for hardly seen > > > > ISA-style devices support. Haven't prepared the log well. > > > > > > Agreed on that! I would essentially suggest to completely ignore pool > > > pre-allocation failures for the small DMA zone. There is barely anything > > > to be ever consuming it. > > > > I would personally suggest to keep it. W/o that, we even don't know the > > issue we are talking about now. I see below commit as a workaround, and > > have been trying to fix it finally with a better solution. > > > > commit c4dc63f0032c ("mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages") > > This will not help in any case but an empty DMA zone. As you can see > this is not the case in my example. Yeah, it's different case. > > > After attempts, I realize it's time to let one zone DMA or DMA32 cover > > the whole low 4G memory on x86_64. That's the real fix. The tiny 16M DMA > > on 64bit system is root cause. > > Yes, I would agree with that. This means DMA zone is gone completely. > > [...] > > > This also mostly papers over this particular problem by allocating > > > allocating two pools for the same range. > > > > No, it doesn't paper over anything, and isn't a hack. Zone dma now covers > > low 4G, just zone DMA32 is empty. Any allocation request with GFP_DMA will > > be satisfied, while request with GFP_DMA32 will fall back to zone DMA. > > This is breaking expectation of the DMA zone on x86. This is not > acceptable. You really want to do it other way around. Only have DMA32 > zone and cover the whole 32b phys address range. DMA zone would be empty > which is something we can have already. This would make the crap HW fail > on allocations. OK, I will consider it in the direction that only DMA32 zone cover the low 4G as you suggested. Do you have pointer about them? Really appreciate it. > > > See my summary about zone DMA/DMA32 on ARCHes. Currently only x86_64 > > always has this burdensome tiny DMA zone. Other ARCH has made adjustment > > to avoid that conditionally. The way I took in above patch is similar > > with arm64 handling. > > Yeah a historical baggage we have to live with or just stop pretending > we really do support that ISA HW. Or make that dynamic and only set up > DMA zone when there is a HW like that present. Ok, I see, you don't like the handling on arm64 and mips either. > > > The two pools for the same range has been there on arm64 and mips, we > > can easily fix it by introducing has_managed_dma32() and checking it > > before allocating atomic_pool_dma32 pool, just like I have done for > > atomic_pool_dma there. > > > > ============================= > > ARCH which has DMA32 > > ZONE_DMA ZONE_DMA32 > > arm64 0~X empty or X~4G (X is got from ACPI or DT. Otherwise it's 4G by default, DMA32 is empty) > > ia64 None 0~4G > > mips empty or 0~16M X~4G (zone DMA is empty on SGI_IP22 or SGI_IP28, otherwise 16M by default like i386) > > riscv None 0~4G > > x86_64 16M 16M~4G > > ============================= > > > > > > As for the only one DMA or DMA32 zone exist on x86_64 you suggested, I > > made below draft change which only creates zone DMA32 to cover the whole > > low 4G meomry, just like RISC-V and ia64 are doing. It works well on > > one intel machine, no other change is required. However, I have one > > concern, it possibly comes from my own misunderstanding, please help > > point out where I got it wrong. If there's only DMA32 zone, and > > CONFIG_ZONE_DMA is disabled, how does it handle GFP_DMA allocation > > request? > > the expected behavior should be to fail the allocation. Could you tell why we should fail the allocation? With my understanding, whether it is allocation request with GFP_DMA or GFP_DMA32, it's requesting memory for DMA buffer, or requesting memory under 32bit. If we only have zone DMA32 covering low 4G, and people requests meomry with GFP_DMA, we should try to satisfy it with memory from zone DMA32. Because people may not know the internal detail about memory from zone DMA or DMA32, and they just want to get a buffer for DMA transfering. With your saying, if there's only DMA32 existing and cover the whole low 4G, any allocation request with GFP_DMA need be failed, we need search and change all those places where GFP_DMA is set. > > > See gfp_zone(), it will return ZONE_NORMAL to user even though > > user expects to get memory for DMA handling? > > It's been quite some time since I've had gfp_zone cached in my brain. I > suspect it would require some special casing for GFP_DMA to always fail. > I would have to study the code again. We have discussed this at some > LSFMM few years back. Maybe you can find something at lwn. > > In any case. This is a larger change. Do you agree that the warning is > pointless and __GFP_NOWARN is a very simple way to deal with it until we > sort out situation of the DMA zone on x86 which is a long standing > problem? I understand the warning will be annoying and customer will complain about it. We can mute it for the time being, and may open it later when we get rid of the 16M DMA zone. So, yes, I agree that there's nothing we can do currently except of muting it. I can add my tag when you update and post.