On Tue, 3 Jan 2023, Matthew Wilcox wrote: > On Tue, Jan 03, 2023 at 11:42:11AM +0100, Vlastimil Babka wrote: > > Separately we should also make the __dump_page() more resilient. > > Right. It's not ideal when one of our best debugging tools obfuscates > the problem we're trying to debug. I've seen probems like this before, > and the problem is that somebody calls dump_page() on a page that they > don't own a refcount on. That lets the page mutate under us in some > fairly awkward ways (as you've seen here, it seems to be part of several > different compound allocations at various points during the dump > process). > > One possibility I thought about was taking our own refcount on the > page at the start of dump_page(). That would kill off the possibility > of ever passing in a const struct page, and it would confuse people. > Also, what if somebody passes in a pointer to something that's not a > struct page? Then we've (tried to) modify memory that's not a refcount. > > I think the best we can do is to snapshot the struct page and the folio > it appears to belong to at the start of dump_page(). It'll take a > little care (for example, folio_pfn() must be passed the original > folio, and not the snapshot), but I think it's doable. > By snapshot do you mean memcpy() of the metadata to the stack? I assume this still leaves the opportunity for the underlying mutation of the page but makes the window more narrow.