On 01.05.20 00:06, Dave Hansen wrote: > I was also wondering if Claudio was right about the debug patch having > races. I went to go look how the s390 code avoids races when pages go > from accessible->inaccessible. > > Because, if if all of the traps are in place to transform pages from > inaccessible->accessible, the code *after* those traps is still > vulnerable. What *keeps* pages accessible? > > The race avoidance is this, basically: > > down_read(&gmap->mm->mmap_sem); > lock_page(page); > ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); > ... >> expected = expected_page_refs(page); >> if (!page_ref_freeze(page, expected)) >> return -EBUSY; >> set_bit(PG_arch_1, &page->flags); >> rc = uv_call(0, (u64)uvcb); >> page_ref_unfreeze(page, expected); > > ... up_read(mmap_sem) / unlock_page() / unlock pte > > I'm assuming that after the uv_call(), the page is inaccessible and I/O > devices will go boom if they touch the page. > > The page_ref_freeze() ensures that references come between the > freeze/unfreeze are noticed, but it doesn't actually *stop* new ones for > users that hold references already. For the page cache, especially, > someone could do: > > page = find_get_page(); > arch_make_page_accessible(); > lock_page(); > ... make_secure_pte(); Not sure if I got your point here, but this make_secure_pte should bail out because we actually do check for a calculated refcount value and return -EBUSY. The find_get_page should have raised this refcount to a value that would go beyond the expected value, No? > unlock_page(); > get_page(); > // ^ OK because I have a ref > // do DMA on inaccessible page > > Because the make_secure_pte() code isn't looking for a *specific* > 'expected' value, it has no way of noticing that the extra ref snuck in > there. I think the expected calcution is actually doing that,giving back the minimum value when no one else has any references that are valid for I/O. But I might not have understood what you are trying to tell me? > > I _think_ expected actually needs to be checked for having a specific > (low) value so that if there's a *possibility* of a reference holder > acquiring additional references, the page is known to be off-limits. > mm/migrate.c has a few examples of this, but I'm not quite sure how > bulletproof they are. Some of it appears to just be optimizations. > > > b