On Thu, Aug 13, 2020 at 12:03 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Aug 12, 2020 at 8:17 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > Since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") > > improved unlock_page(), it has become more noticeable how cow_user_page() > > in a kernel with CONFIG_DMA_API_DEBUG=y can create and suffer from heavy > > contention on DMA debug's radix_lock in debug_dma_assert_idle(). > > Ooh. > > Yeah, that's ridiculously expensive, and serializes things for no good reason. > > Your patch looks obviously correct to me (Christoph?), but it also > makes me go "why are we doing this in the first place"? > > Because it looks to me like > (a) the debug check is wrong > (b) this is left-over from early debugging > > In particular, I don't see why we couldn't do a COW on a page that is > under writeback at the same time. We're not changing the page that is > doing DMA. > > In fact, the whole "COW with DMA" makes me feel like the real bug may > have been due that whole "ambiguous COW" thing, which was fixed in > 17839856fd58 ("gup: document and work around "COW can break either > way" issue") > > That debug thing goes back almost 7 years, and I don't think it has > caught anything in those seven years, but I could be wrong. > > The commit that adds it does talk about a bug, but that code was > removed entirely eventually. And google shows no hits for > debug_dma_assert_idle() since - until your email. > > So my gut feel is that we should remove the check entirely, although > your patch does seem like a big improvement. > > Christoph? > > (And Dan too, of course, in case he happens to be relaxing in front of > the computer away from a newborn baby ;) > I can at least confirm that it has not caught anything in a long while except a false positive that needed a fix up. https://lore.kernel.org/lkml/CAPcyv4hy_nNe8G0o8sMrz9A8HcdRzAuKgXmvdjKusAAA3Fow4g@xxxxxxxxxxxxxx/ Part of me says it's not doing anything worthwhile upstream, but I wonder if it is keeping some people from submitting patches that play these page reference shenanigans? I know they're out there. The land of gup and truncate is where questionable kernel changes go to die. Outside of that, Hugh's patch looks like a definite improvement so I'd be inclined to run with that, but rip the whole facility out at the next sign of a false positive.