On Tue, Mar 12, 2024 at 11:51:13AM -0400, Zi Yan wrote: > On 12 Mar 2024, at 10:19, Matthew Wilcox wrote: > > > On Tue, Mar 12, 2024 at 10:13:16AM -0400, Zi Yan wrote: > >> On 11 Mar 2024, at 23:45, Matthew Wilcox wrote: > >>> Much more important: You're doing this with a positive refcount, which > >>> breaks the (undocumented) logic in deferred_split_scan() that a folio > >>> with a positive refcount will not be removed from the list. > >> > >> What is the issue here? I thought as long as the split_queue_lock is held, > >> it should be OK to manipulate the list. > > > > I just worked this out yesterday: > > https://lore.kernel.org/linux-mm/Ze9EFdFLXQEUVtKl@xxxxxxxxxxxxxxxxxxxx/ > > (the last chunk, starting with Ryan asking me "what about the first bug > > you found") > > Hmm, like you said a folio with a positive refcount will not be removed > from ds_queue->split_queue, it will have no chance going to the separate > list in deferred_list_scan() and list_del_init() will not corrupt > that list. You've misread it. Folios with a _zero_ refcount are not removed from the list in deferred_split_scan. Folios with a positive refcount are removed from the per-node or per-cgroup list _at which point there is an undocumented assumption_ that they will not be removed from the local list because they have a positive refcount. > So it should be safe. Or the issue is that before migration > adding a refcount, the folio is removed from ds_queue->split_queue > and put on the list in deferred_list_scan(), as a result, any manipulation > of folio->_deferred_list could corrupt the list. Basically, > !list_empty(&folio->_deferred_list) cannot tell if the folio is on > ds_queue->split_queue or another list. I am not sure about why "a positive > refcount" is related here. > > That makes me wonder whether ds_queue->split_queue_lock is also needed > for list_for_each_entry_safe() in deferred_split_scan(). Basically, > ds_queue->split_queue_lock protects folio->_deferred_list in addition to > ds_queue->split_queue. > > > > >>> Maximally important: Wer shouldn't be doing any of this! This folio is > >>> on the deferred split list. We shouldn't be migrating it as a single > >>> entity; we should be splitting it now that we're in a context where we > >>> can do the right thing and split it. Documentation/mm/transhuge.rst > >>> is clear that we don't split it straight away due to locking context. > >>> Splitting it on migration is clearly the right thing to do. > >>> > >>> If splitting fails, we should just fail the migration; splitting fails > >>> due to excess references, and if the source folio has excess references, > >>> then migration would fail too. > >> > >> You are suggesting: > >> 1. checking if the folio is on deferred split list or not > >> 2. if yes, split the folio > >> 3. if split fails, fail the migration as well. > >> > >> It sounds reasonable to me. The split folios should be migrated since > >> the before-split folio wants to be migrated. This split is not because > >> no new page cannot be allocated, thus the split folios should go > >> into ret_folios list instead of split_folios list. > > > > Yes, I'm happy for the split folios to be migrated. Bonus points if you > > want to figure out what order to split the folio to ;-) I don't think > > it's critical. > > > -- > Best Regards, > Yan, Zi