On Fri, Apr 16, 2021 at 07:08:23PM +0200, Jesper Dangaard Brouer wrote: > On Fri, 16 Apr 2021 16:27:55 +0100 > Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > > > See below patch. Where I swap32 the dma address to satisfy > > > page->compound having bit zero cleared. (It is the simplest fix I could > > > come up with). > > > > I think this is slightly simpler, and as a bonus code that assumes the > > old layout won't compile. > > This is clever, I like it! When reading the code one just have to > remember 'unsigned long' size difference between 64-bit vs 32-bit. > And I assume compiler can optimize the sizeof check out then doable. I checked before/after with the replacement patch that doesn't have compiler warnings. On x86, there is zero codegen difference (objdump -dr before/after matches exactly) for both x86-32 with 32-bit dma_addr_t and x86-64. For x86-32 with 64-bit dma_addr_t, the compiler makes some different inlining decisions in page_pool_empty_ring(), page_pool_put_page() and page_pool_put_page_bulk(), but it's not clear to me that they're wrong. Function old new delta page_pool_empty_ring 387 307 -80 page_pool_put_page 604 516 -88 page_pool_put_page_bulk 690 517 -173