On 07/26/2018 08:07 PM, Matthew Wilcox wrote: > On Thu, Jul 26, 2018 at 04:06:05PM -0400, Tony Battersby wrote: >> On 07/26/2018 03:42 PM, Matthew Wilcox wrote: >>> On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote: >>>> dma_pool_free() scales poorly when the pool contains many pages because >>>> pool_find_page() does a linear scan of all allocated pages. Improve its >>>> scalability by replacing the linear scan with a red-black tree lookup. >>>> In big O notation, this improves the algorithm from O(n^2) to O(n * log n). >>> This is a lot of code to get us to O(n * log(n)) when we can use less >>> code to go to O(n). dma_pool_free() is passed the virtual address. >>> We can go from virtual address to struct page with virt_to_page(). >>> In struct page, we have 5 words available (20/40 bytes), and it's trivial >>> to use one of them to point to the struct dma_page. >>> >> Thanks for the tip. I will give that a try. > If you're up for more major surgery, then I think we can put all the > information currently stored in dma_page into struct page. Something > like this: > > +++ b/include/linux/mm_types.h > @@ -152,6 +152,12 @@ struct page { > unsigned long hmm_data; > unsigned long _zd_pad_1; /* uses mapping */ > }; > + struct { /* dma_pool pages */ > + struct list_head dma_list; > + unsigned short in_use; > + unsigned short offset; > + dma_addr_t dma; > + }; > > /** @rcu_head: You can use this to free a page by RCU. */ > struct rcu_head rcu_head; > > page_list -> dma_list > vaddr goes away (page_to_virt() exists) > dma -> dma > in_use and offset shrink from 4 bytes to 2. > > Some 32-bit systems have a 64-bit dma_addr_t, and on those systems, > this will be 8 + 2 + 2 + 8 = 20 bytes. On 64-bit systems, it'll be > 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available). > > offset at least needs more bits, since allocations can be multi-page. See the following from mpt3sas: cat /sys/devices/pci0000:80/0000:80:07.0/0000:85:00.0/pools (manually cleaned up column alignment) poolinfo - 0.1 reply_post_free_array pool 1 21 192 1 reply_free pool 1 1 41728 1 reply pool 1 1 1335296 1 sense pool 1 1 970272 1 chain pool 373959 386048 128 12064 reply_post_free pool 12 12 166528 12 ^size^ In my initial implementation I also added a pointer from struct page to the dma_pool so that dma_pool_free() could sanity-check that the page really belongs to the pool, as in: pg = virt_to_page(vaddr); if (unlikely(pg->dma_pool != pool)) { handle error... } page = pg->dma_page; The linked-list search previously implemented that check as a by-product, and I didn't want to lose it. It all seems to be working so far.