On 3 Jul 2024, at 12:21, David Hildenbrand wrote: > On 03.07.24 16:30, Zi Yan wrote: >> On 2 Jul 2024, at 3:40, Hugh Dickins wrote: >> >>> Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on >>> flags when freeing, yet the flags shown are not bad: PG_locked had been >>> set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from >>> deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN >>> symptoms implying double free by deferred split and large folio migration. >>> >>> 6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large >>> folio migration") was right to fix the memcg-dependent locking broken in >>> 85ce2c517ade ("memcontrol: only transfer the memcg data for migration"), >>> but missed a subtlety of deferred_split_scan(): it moves folios to its own >>> local list to work on them without split_queue_lock, during which time >>> folio->_deferred_list is not empty, but even the "right" lock does nothing >>> to secure the folio and the list it is on. >>> >>> Fortunately, deferred_split_scan() is careful to use folio_try_get(): so >>> folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable() >>> while the old folio's reference count is temporarily frozen to 0 - adding >>> such a freeze in the !mapping case too (originally, folio lock and >>> unmapping and no swap cache left an anon folio unreachable, so no freezing >>> was needed there: but the deferred split queue offers a way to reach it). >>> >>> Fixes: 9bcef5973e31 ("mm: memcg: fix split queue list crash when large folio migration") >>> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> >>> Cc: stable@xxxxxxxxxxxxxxx >>> --- >>> This patch against 6.10-rc6: Kefeng has commits in the mm-tree which >>> which will need adjustment to go over this, but we can both check the >>> result. I have wondered whether just reverting 85ce2c517ade and its >>> subsequent fixups would be better: but that would be a bigger job, >>> and probably not the right choice. >>> >>> mm/memcontrol.c | 11 ----------- >>> mm/migrate.c | 13 +++++++++++++ >>> 2 files changed, 13 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index 71fe2a95b8bd..8f2f1bb18c9c 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -7823,17 +7823,6 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) >>> >>> /* Transfer the charge and the css ref */ >>> commit_charge(new, memcg); >>> - /* >>> - * If the old folio is a large folio and is in the split queue, it needs >>> - * to be removed from the split queue now, in case getting an incorrect >>> - * split queue in destroy_large_folio() after the memcg of the old folio >>> - * is cleared. >>> - * >>> - * In addition, the old folio is about to be freed after migration, so >>> - * removing from the split queue a bit earlier seems reasonable. >>> - */ >>> - if (folio_test_large(old) && folio_test_large_rmappable(old)) >>> - folio_undo_large_rmappable(old); >>> old->memcg_data = 0; >>> } >>> >>> diff --git a/mm/migrate.c b/mm/migrate.c >>> index 20cb9f5f7446..a8c6f466e33a 100644 >>> --- a/mm/migrate.c >>> +++ b/mm/migrate.c >>> @@ -415,6 +415,15 @@ int folio_migrate_mapping(struct address_space *mapping, >>> if (folio_ref_count(folio) != expected_count) >>> return -EAGAIN; >>> >>> + /* Take off deferred split queue while frozen and memcg set */ >>> + if (folio_test_large(folio) && >>> + folio_test_large_rmappable(folio)) { >>> + if (!folio_ref_freeze(folio, expected_count)) >>> + return -EAGAIN; >>> + folio_undo_large_rmappable(folio); >>> + folio_ref_unfreeze(folio, expected_count); >>> + } >>> + >> >> I wonder if the patch below would make the code look better by using >> the same freeze/unfreeze pattern like file-backed path. After >> reading the emails between you and Baolin and checking the code, >> I think the patch looks good to me. Feel free to add >> Reviewed-by: Zi Yan <ziy@xxxxxxxxxx> >> >> BTW, this subtlety is very error prone, as Matthew, Ryan, and I all >> encountered errors because of this[1][2]. Matthew has a good summary >> of the subtlety: >> >> "the (undocumented) logic in deferred_split_scan() that a folio >> with a positive refcount will not be removed from the list." >> >> [1] https://lore.kernel.org/linux-mm/Ze9EFdFLXQEUVtKl@xxxxxxxxxxxxxxxxxxxx/ >> [2] https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@xxxxxxxxxxxxxxxxxxxx/ >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index a8c6f466e33a..afcc0653dcb7 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -412,17 +412,15 @@ int folio_migrate_mapping(struct address_space *mapping, >> >> if (!mapping) { >> /* Anonymous page without mapping */ >> - if (folio_ref_count(folio) != expected_count) >> + if (!folio_ref_freeze(folio, expected_count)) >> return -EAGAIN; >> >> /* Take off deferred split queue while frozen and memcg set */ >> if (folio_test_large(folio) && >> - folio_test_large_rmappable(folio)) { >> - if (!folio_ref_freeze(folio, expected_count)) >> - return -EAGAIN; >> + folio_test_large_rmappable(folio)) >> folio_undo_large_rmappable(folio); >> - folio_ref_unfreeze(folio, expected_count); >> - } >> + >> + folio_ref_unfreeze(folio, expected_count); >> > > The downside is freezing order-0, where we don't need to freeze, right? Right. I missed that part. Forget about my change above. Thanks. -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature