On 07/27/2018 11:23 AM, Matthew Wilcox wrote: > On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote: >> On 07/26/2018 08:07 PM, Matthew Wilcox wrote: >>> If you're up for more major surgery, then I think we can put all the >>> information currently stored in dma_page into struct page. Something >>> like this: >>> >>> +++ b/include/linux/mm_types.h >>> @@ -152,6 +152,12 @@ struct page { >>> unsigned long hmm_data; >>> unsigned long _zd_pad_1; /* uses mapping */ >>> }; >>> + struct { /* dma_pool pages */ >>> + struct list_head dma_list; >>> + unsigned short in_use; >>> + unsigned short offset; >>> + dma_addr_t dma; >>> + }; >>> >>> /** @rcu_head: You can use this to free a page by RCU. */ >>> struct rcu_head rcu_head; >>> >>> page_list -> dma_list >>> vaddr goes away (page_to_virt() exists) >>> dma -> dma >>> in_use and offset shrink from 4 bytes to 2. >>> >>> Some 32-bit systems have a 64-bit dma_addr_t, and on those systems, >>> this will be 8 + 2 + 2 + 8 = 20 bytes. On 64-bit systems, it'll be >>> 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available). >>> >>> >> offset at least needs more bits, since allocations can be multi-page. > Ah, rats. That means we have to use the mapcount union too: > Actually, on second thought, if I understand it correctly, a multi-page allocation will never be split up and returned as multiple sub-allocations, so the offset shouldn't be needed for that case. The offset should only be needed when splitting a PAGE_SIZE-allocation into smaller sub-allocations. The current code uses the offset unconditionally though, so it would need major changes to remove the dependence. So a 16-bit offset might work. As for sanity checking, I suppose having the dma address in the page could provide something for dma_pool_free() to check against (in fact it is already there under DMAPOOL_DEBUG). But the bigger problem is that my first patch adds another list_head to the dma_page for the avail_page_link to make allocations faster. I suppose we could make the lists singly-linked instead of doubly-linked to save space. Wouldn't using the mapcount union make it problematic for userspace to mmap() the returned DMA buffers? I am not sure if any drivers allow that to be done or not. I have heard of drivers in userspace, drivers with DMA ring buffers, etc. I don't want to audit the whole kernel tree to know if it would be safe. As you have seen, at least mpt3sas is doing unexpected things with dma pools. So maybe it could be done, but you are right, it would involve major surgery. My current in-development patch to implement your intial suggestion is pretty small and it works. So I'm not sure if I want to take it further or not. Lots of other things to do...