On 12 Mar 2024, at 12:38, Matthew Wilcox wrote: > On Tue, Mar 12, 2024 at 11:51:13AM -0400, Zi Yan wrote: >> On 12 Mar 2024, at 10:19, Matthew Wilcox wrote: >> >>> On Tue, Mar 12, 2024 at 10:13:16AM -0400, Zi Yan wrote: >>>> On 11 Mar 2024, at 23:45, Matthew Wilcox wrote: >>>>> Much more important: You're doing this with a positive refcount, which >>>>> breaks the (undocumented) logic in deferred_split_scan() that a folio >>>>> with a positive refcount will not be removed from the list. >>>> >>>> What is the issue here? I thought as long as the split_queue_lock is held, >>>> it should be OK to manipulate the list. >>> >>> I just worked this out yesterday: >>> https://lore.kernel.org/linux-mm/Ze9EFdFLXQEUVtKl@xxxxxxxxxxxxxxxxxxxx/ >>> (the last chunk, starting with Ryan asking me "what about the first bug >>> you found") >> >> Hmm, like you said a folio with a positive refcount will not be removed >> from ds_queue->split_queue, it will have no chance going to the separate >> list in deferred_list_scan() and list_del_init() will not corrupt >> that list. > > You've misread it. Folios with a _zero_ refcount are not removed from > the list in deferred_split_scan. Folios with a positive refcount are > removed from the per-node or per-cgroup list _at which point there is > an undocumented assumption_ that they will not be removed from the > local list because they have a positive refcount. But that sounds very subtle if not broken. As an outsider of deferred_split_scan(), only !list_empty(folio->_deferred_list) is checked. The condition can be true if the folio is on split_queue or local list of deferred_split_scan() with elevated refcount. In that case, the folio cannot be removed from the list (either split_queue or local list) even if split_queue_lock is held, since local list manipulation is not under split_queue_lock. This makes _deferred_list a one-way train to anyone except deferred_split_scan(), namely folios can only be added into _deferred_list until they are freed or split by deferred_split_scan(). Is that intended? If yes, maybe we should document it. If not, using split_queue_lock to protect local list, or more explicitly folio->_deferred_list might be better? >> So it should be safe. Or the issue is that before migration >> adding a refcount, the folio is removed from ds_queue->split_queue >> and put on the list in deferred_list_scan(), as a result, any manipulation >> of folio->_deferred_list could corrupt the list. Basically, >> !list_empty(&folio->_deferred_list) cannot tell if the folio is on >> ds_queue->split_queue or another list. I am not sure about why "a positive >> refcount" is related here. >> >> That makes me wonder whether ds_queue->split_queue_lock is also needed >> for list_for_each_entry_safe() in deferred_split_scan(). Basically, >> ds_queue->split_queue_lock protects folio->_deferred_list in addition to >> ds_queue->split_queue. >> >> >> >>>>> Maximally important: Wer shouldn't be doing any of this! This folio is >>>>> on the deferred split list. We shouldn't be migrating it as a single >>>>> entity; we should be splitting it now that we're in a context where we >>>>> can do the right thing and split it. Documentation/mm/transhuge.rst >>>>> is clear that we don't split it straight away due to locking context. >>>>> Splitting it on migration is clearly the right thing to do. >>>>> >>>>> If splitting fails, we should just fail the migration; splitting fails >>>>> due to excess references, and if the source folio has excess references, >>>>> then migration would fail too. >>>> >>>> You are suggesting: >>>> 1. checking if the folio is on deferred split list or not >>>> 2. if yes, split the folio >>>> 3. if split fails, fail the migration as well. >>>> >>>> It sounds reasonable to me. The split folios should be migrated since >>>> the before-split folio wants to be migrated. This split is not because >>>> no new page cannot be allocated, thus the split folios should go >>>> into ret_folios list instead of split_folios list. >>> >>> Yes, I'm happy for the split folios to be migrated. Bonus points if you >>> want to figure out what order to split the folio to ;-) I don't think >>> it's critical. >> >> >> -- >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature