On 02/22/22 at 09:12pm, Baoquan He wrote: > On 02/22/22 at 09:44am, Christoph Hellwig wrote: > > On Mon, Feb 21, 2022 at 02:57:34PM +0100, Heiko Carstens wrote: > > > > 1) Kmalloc(GFP_DMA) in s390 platform, under arch/s390 and drivers/s390; > > > > > > So, s390 partially requires GFP_DMA allocations for memory areas which > > > are required by the hardware to be below 2GB. There is not necessarily > > > a device associated when this is required. E.g. some legacy "diagnose" > > > calls require buffers to be below 2GB. > > > > > > How should something like this be handled? I'd guess that the > > > dma_alloc API is not the right thing to use in such cases. Of course > > > we could say, let's waste memory and use full pages instead, however > > > I'm not sure this is a good idea. > > > > Yeah, I don't think the DMA API is the right thing for that. This > > is one of the very rare cases where a raw allocation makes sense. > > > > That being said being able to drop kmalloc support for GFP_DMA would > > be really useful. How much memory would we waste if switching to the > > page allocator? > > > > > s390 drivers could probably converted to dma_alloc API, even though > > > that would cause quite some code churn. > > > > I think that would be a very good thing to have. > > > > > > For this first patch series, thanks to Hyeonggon for helping > > > > reviewing and great suggestions on patch improving. We will work > > > > together to continue the next steps of work. > > > > > > > > Any comment, thought, or suggestoin is welcome and appreciated, > > > > including but not limited to: > > > > 1) whether we should remove dma-kmalloc support in kernel(); > > > > > > The question is: what would this buy us? As stated above I'd assume > > > this comes with quite some code churn, so there should be a good > > > reason to do this. > > > > There is two steps here. One is to remove GFP_DMA support from > > kmalloc, which would help to cleanup the slab allocator(s) very nicely, > > as at that point it can stop to be zone aware entirely. > > > > The long term goal is to remove ZONE_DMA entirely at least for > > architectures that only use the small 16MB ISA-style one. It can > > then be replaced with for example a CMA area and fall into a movable > > zone. I'd have to prototype this first and see how it applies to the > > s390 case. It might not be worth it and maybe we should replace > > ZONE_DMA and ZONE_DMA32 with a ZONE_LIMITED for those use cases as > > the amount covered tends to not be totally out of line for what we > > built the zone infrastructure. > > > > > >From this cover letter I only get that there was a problem with kdump > > > on x86, and this has been fixed. So why this extra effort? > > > > > > > 3) Drop support for allocating DMA memory from slab allocator > > > > (as Christoph Hellwig said) and convert them to use DMA32 > > > > and see what happens > > > > > > Can you please clarify what "convert to DMA32" means? I would assume > > > this does _not_ mean that passing GFP_DMA32 to slab allocator would > > > work then? > > > > I'm really not sure what this means. > > Thanks a lot to Heiko for valuable input, it's very helpful. And thanks > a lot to Christoph for explaining. > > I guess this "convert to DMA32" is similar to "replace ZONE_DMA and > ZONE_DMA32 with a ZONE_LIMITED". And by the way, when I searched SLAB_CACHE_DMA32 which is another zone aware slab flag, I got that not all people likes to abuse kmalloc(GFP_DMA). There are two places where kmem_cache_create(SLAB_CACHE_DMA32) are called to create slab grabbing memory from zone DMA32. Obviously the code author really knows slab allocator. They use dma32 slab to get cache memory under 4G. drivers/firmware/google/gsmi.c : gsmi_init() drivers/iommu/io-pgtable-arm-v7s.c: arm_v7s_alloc_pgtable() > > When I use 'git grep "GFP_DMA/>"' to search all places specifying GFP_DMA, > I noticed the main usage of kmalloc(GFP_DMA) is to get memory under a > memory limitation, but not for DMA buffer allocation. Below is what I got > for earlier kdump issue explanation. It can help explain why kmalloc(GFP_DMA) > is useful on ARCHes w/o ZONE_DMA32, but doesn't make sense on x86_64 which > has both zone DMA and DMA32. The 16M ZONE_DMA is only for very rarely used > legacy ISA device, but most pci devices driver supporting 32bit addressing > likes to abuse kmalloc(GFP_DMA) to get DMA buffer from the zone DMA. > That obviously is unsafe and unreasonable. > > Like risc-V which doesn't have the burden of legacy ISA devices, it can > take only containing DMA32 zone way. ARM64 also adjusts to have only > arm64 if not on Raspberry Pi. Using kmalloc(GFP_DMA) makes them no > inconvenience. If finally having dma32-kmalloc, the name may need be > carefully considerred, it seems to be acceptable. We just need to pick > up those ISA device driver and handle their 24bit addressing DMA well. > > For this patchset, I only find out places in which GPF_DMA is > redundant and can be removed directly, and places where > kmalloc(GFP_DMA)|dma_map_ pair can be replaced with dma_alloc_xxxx() API > and the memory wasting is not so big. I have patches converting > kmalloc(GFP_DMA) to alloc_pages(GFP_DMA), but not easy to replace with > dma_alloc_xxx(), Hyeonggon suggested not adding them to this series. > I will continue investigating the left places, see whether or how we can > convert them. > > ============================= > ARCH which has DMA32 > ZONE_DMA ZONE_DMA32 > arm64 0~X X~4G (X is got from ACPI or DT. Otherwise it's 4G by default, DMA32 is empty) > ia64 None 0~4G > mips 0 or 0~16M X~4G (zone DMA is empty on SGI_IP22 or SGI_IP28, otherwise 16M by default like i386) > riscv None 0~4G > x86_64 16M 16M~4G > > > ============================= > ARCH which has no DMA32 > ZONE_DMA > alpha 0~16M or empty if IOMMU enabled > arm 0~X (X is reported by fdt, 4G by default) > m68k 0~total memory > microblaze 0~total low memory > powerpc 0~2G > s390 0~2G > sparc 0~ total low memory > i386 0~16M > > > > > > > > > btw. there are actually two kmalloc allocations which pass GFP_DMA32; > > > I guess this is broken(?): > > > > > > drivers/hid/intel-ish-hid/ishtp-fw-loader.c: dma_buf = kmalloc(payload_max_size, GFP_KERNEL | GFP_DMA32); > > > drivers/media/test-drivers/vivid/vivid-osd.c: dev->video_vbase = kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32); > > > > Yes, this is completely broken. > > >