On Tue, Feb 7, 2023 at 6:50 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Mon, Feb 06, 2023 at 03:52:19PM -0500, Peter Xu wrote: > > On Mon, Feb 06, 2023 at 07:01:39PM +0000, Matthew Wilcox wrote: > > > On Mon, Feb 06, 2023 at 08:28:56PM +0900, David Stevens wrote: > > > > This change first makes sure that the intermediate page cache state > > > > during collapse is not visible by moving when gaps are filled to after > > > > the page cache lock is acquired for the final time. This is necessary > > > > because the synchronization provided by locking hpage is insufficient > > > > for functions which operate on the page cache without actually locking > > > > individual pages to examine their content (e.g. shmem_mfill_atomic_pte). > > > > > > I've been a little scared of touching khugepaged because, well, look at > > > that function. But if we are going to touch it, how about this patch > > > first? It does _part_ of what you need by not filling in the holes, > > > but obviously not the part that looks at uffd. > > > > > > It leaves the old pages in-place and frozen. I think this should be > > > safe, but I haven't booted it (not entirely sure what test I'd run > > > to prove that it's not broken) > > > > That logic existed since Kirill's original commit to add shmem thp support > > on khugepaged, so Kirill should be the best to tell.. but so far it seems > > reasonalbe to me to have that extra operation. > > > > The problem is khugepaged will release pgtable lock during collapsing, so > > AFAICT there can be a race where some other thread tries to insert pages > > into page cache in parallel with khugepaged right after khugepaged released > > the page cache lock. > > > > For example, it seems to me new page cache can be inserted when khugepaged > > is copying small page content to the new hpage. This particular race can't happen with either patch, since the missing page cache entries are filled when we create the multi-index entry for hpage. > Mmm, yes, we need to have _something_ in the page cache to block new > pages from being added. It can be either the new or the old pages, > but it can't be NULL. It could even be a RETRY entry, since that'll > have the same effect as a frozen page. > > But both David's patch and mine are wrong. Not sure what to do for > David's problem -- maybe it's OK to have the holes temporarily filled > with frozen / RETRY entries until we get to the point where we check > for an uffd marker? My patch re-counts the holes after acquiring the page cache lock for the final time, right before creating the final hpage multi-index entry. Since we lock present pages while iterating over the target range, they can't have been truncated before our re-validation of nr_none. So if the number of missing pages is still equal to nr_none, then we know that nothing has come along and filled in a missing page. Compared to adding some sort of marker for missing pages, this does add another failure path for collapse, but I don't think there is any race. -David