On 12 Mar 2024, at 4:05, Ryan Roberts wrote: > On 12/03/2024 03:45, Matthew Wilcox wrote: >> On Mon, Mar 11, 2024 at 03:58:48PM -0400, Zi Yan wrote: >>> @@ -1168,6 +1172,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >>> folio_lock(src); >>> } >>> locked = true; >>> + if (folio_test_large_rmappable(src) && > > I think you also need to check that the order > 1, now that we support order-1 > pagecache folios? _deferred_list only exists if order > 1. > >>> + !list_empty(&src->_deferred_list)) { >>> + struct deferred_split *ds_queue = get_deferred_split_queue(src); >>> + >>> + spin_lock(&ds_queue->split_queue_lock); >>> + ds_queue->split_queue_len--; >>> + list_del_init(&src->_deferred_list); >>> + spin_unlock(&ds_queue->split_queue_lock); >>> + old_page_state |= PAGE_WAS_ON_DEFERRED_LIST; >>> + } >> >> I have a few problems with this ... >> >> Trivial: your whitespace is utterly broken. You can't use a single tab >> for both indicating control flow change and for line-too-long. >> >> Slightly more important: You're checking list_empty outside the lock >> (which is fine in order to avoid unnecessarily acquiring the lock), >> but you need to re-check it inside the lock in case of a race. And you >> didn't mark it as data_race(), so KMSAN will whinge. > > I've seen data_race() used around list_empty() without the lock held > inconsistently (see deferred_split_folio()). What are the rules? Given that we > are not doing a memory access here, I don't really understand why it is needed? > list_empty() uses READ_ONCE() which I thought would be sufficient? (I've just > added a similar lockless check in my swap-out series - will add data_race() if > needed, but previously concluded it's not). > >> >> Much more important: You're doing this with a positive refcount, which >> breaks the (undocumented) logic in deferred_split_scan() that a folio >> with a positive refcount will not be removed from the list. >> >> Maximally important: Wer shouldn't be doing any of this! This folio is >> on the deferred split list. We shouldn't be migrating it as a single >> entity; we should be splitting it now that we're in a context where we >> can do the right thing and split it. Documentation/mm/transhuge.rst >> is clear that we don't split it straight away due to locking context. >> Splitting it on migration is clearly the right thing to do. >> >> If splitting fails, we should just fail the migration; splitting fails >> due to excess references, and if the source folio has excess references, >> then migration would fail too. > > This comment makes me wonder what we do in split_huge_page_to_list_to_order() if > the target order is greater than 1 and the input folio is on the deferred split > list. Looks like we currently just remove it from the deferred list. Is there a > case for putting any output folios that are still partially mapped back on the > deferred list, or splitting them to a lower order such that all output folios > are fully mapped, and all unmapped portions are freed? I probably would let the caller of split_huge_page_to_list_to_order() to decide whether output folios should be put back in deferred list. The caller should determine the right order to split. Letting split_huge_page_to_list_to_order() change new_order will confuse the caller and complicate the handling of output folios. -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature