Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/27/2018 11:23 AM, Matthew Wilcox wrote:
> On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote:
>> On 07/26/2018 08:07 PM, Matthew Wilcox wrote:
>>> If you're up for more major surgery, then I think we can put all the
>>> information currently stored in dma_page into struct page.  Something
>>> like this:
>>>
>>> +++ b/include/linux/mm_types.h
>>> @@ -152,6 +152,12 @@ struct page {
>>>                         unsigned long hmm_data;
>>>                         unsigned long _zd_pad_1;        /* uses mapping */
>>>                 };
>>> +               struct {        /* dma_pool pages */
>>> +                       struct list_head dma_list;
>>> +                       unsigned short in_use;
>>> +                       unsigned short offset;
>>> +                       dma_addr_t dma;
>>> +               };
>>>  
>>>                 /** @rcu_head: You can use this to free a page by RCU. */
>>>                 struct rcu_head rcu_head;
>>>
>>> page_list -> dma_list
>>> vaddr goes away (page_to_virt() exists)
>>> dma -> dma
>>> in_use and offset shrink from 4 bytes to 2.
>>>
>>> Some 32-bit systems have a 64-bit dma_addr_t, and on those systems,
>>> this will be 8 + 2 + 2 + 8 = 20 bytes.  On 64-bit systems, it'll be
>>> 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).
>>>
>>>
>> offset at least needs more bits, since allocations can be multi-page. 
> Ah, rats.  That means we have to use the mapcount union too:
>

Actually, on second thought, if I understand it correctly, a multi-page
allocation will never be split up and returned as multiple
sub-allocations, so the offset shouldn't be needed for that case.  The
offset should only be needed when splitting a PAGE_SIZE-allocation into
smaller sub-allocations.  The current code uses the offset
unconditionally though, so it would need major changes to remove the
dependence.  So a 16-bit offset might work.

As for sanity checking, I suppose having the dma address in the page
could provide something for dma_pool_free() to check against (in fact it
is already there under DMAPOOL_DEBUG).

But the bigger problem is that my first patch adds another list_head to
the dma_page for the avail_page_link to make allocations faster.  I
suppose we could make the lists singly-linked instead of doubly-linked
to save space.

Wouldn't using the mapcount union make it problematic for userspace to
mmap() the returned DMA buffers?  I am not sure if any drivers allow
that to be done or not.  I have heard of drivers in userspace, drivers
with DMA ring buffers, etc.  I don't want to audit the whole kernel tree
to know if it would be safe.  As you have seen, at least mpt3sas is
doing unexpected things with dma pools.

So maybe it could be done, but you are right, it would involve major
surgery.  My current in-development patch to implement your intial
suggestion is pretty small and it works.  So I'm not sure if I want to
take it further or not.  Lots of other things to do...




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux