Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840

David Hildenbrand <david@xxxxxxxxxx> · Tue, 28 May 2024 15:57:58 +0200

Am 28.05.24 um 08:05 schrieb Mikhail Gavrilov:
On Thu, May 23, 2024 at 12:05 PM Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:

On Thu, May 9, 2024 at 10:50 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

Do you have the other stracktrace as well?

Maybe triggering memory reclaim (e.g., using "stress" or "memhog") could
trigger it, that might be reasonable to trey. Once we have a reproducer
we could at least bisect.


The only known workload that causes this is updating a large
container. Unfortunately, not every container update reproduces the
problem.

Is it possible to add more debugging information to make it clearer
what's going on?

If we knew who originally allocated that problematic page, that might help. 
Maybe page_owner could give some hints?


BUG: Bad page state in process kcompactd0  pfn:605811
page: refcount:0 mapcount:0 mapping:0000000082d91e3e index:0x1045efc4f
pfn:0x605811
aops:btree_aops ino:1
flags: 0x17ffffc600020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc600020c dead000000000100 dead000000000122 ffff888159075220
raw: 00000001045efc4f 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping

Seems to be an order-0 page, otherwise we would have another "head: ..." report.

It's not an anon/ksm/non-lru migration folio, because we clear the page->mapping 
field for them manually on the page freeing path. Likely it's a pagecache folio.

So one option is that something seems to not properly set folio->mapping to 
NULL. But that problem would then also show up without page migration? Hmm.

Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
BIOS 2611 04/07/2024
Call Trace:
  <TASK>
  dump_stack_lvl+0x84/0xd0
  bad_page.cold+0xbe/0xe0
  ? __pfx_bad_page+0x10/0x10
  ? page_bad_reason+0x9d/0x1f0
  free_unref_page+0x838/0x10e0
  __folio_put+0x1ba/0x2b0
  ? __pfx___folio_put+0x10/0x10
  ? __pfx___might_resched+0x10/0x10

I suspect we come via 		
	migrate_pages_batch()->migrate_folio_unmap()->migrate_folio_done().

Maybe this is the "Folio was freed from under us. So we are done." path
when "folio_ref_count(src) == 1".

Alternatively, we might come via
	migrate_pages_batch()->migrate_folio_move()->migrate_folio_done().

For ordinary migration, move_to_new_folio() will clear src->mapping if
the folio was migrated successfully. That's the very first thing that 
migrate_folio_move() does, so I doubt that is the problem.

So I suspect we are in the migrate_folio_unmap() path. But for
a !anon folio, who should be freeing the folio concurrently (and not clearing 
folio->mapping?)? After all, we have to hold the folio lock while migrating.

In khugepaged:collapse_file() we manually set folio->mapping = NULL, before 
dropping the reference.

Something to try might be (to see if the problem goes away).

diff --git a/mm/migrate.c b/mm/migrate.c
index dd04f578c19c..45e92e14c904 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1124,6 +1124,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
                /* Folio was freed from under us. So we are done. */
                folio_clear_active(src);
                folio_clear_unevictable(src);
+               /*
+                * Anonymous and movable src->mapping will be cleared by
+                * free_pages_prepare so don't reset it here for keeping
+                * the type to work PageAnon, for example.
+                */
+               if (!folio_mapping_flags(src))
+                       src->mapping = NULL;
                /* free_pages_prepare() will clear PG_isolated. */
                list_del(&src->lru);
                migrate_folio_done(src, reason);

But it does feel weird: who freed the page concurrently and didn't clear 
folio->mapping ...

We don't hold the folio lock of src, though, but have the only reference. So
another possible thing might be folio refcount mis-counting: folio_ref_count() 
== 1 but there are other references (e.g., from the pagecache).


  ? migrate_folio_done+0x1de/0x2b0
  migrate_pages_batch+0xe73/0x2880
  ? __pfx_compaction_alloc+0x10/0x10
  ? __pfx_compaction_free+0x10/0x10
  ? __pfx_migrate_pages_batch+0x10/0x10
  ? trace_irq_enable.constprop.0+0xce/0x110
  ? __pfx_remove_migration_pte+0x10/0x10
  ? rcu_is_watching+0x12/0xc0
  migrate_pages+0x194f/0x22f0
  ? __pfx_compaction_alloc+0x10/0x10
  ? __pfx_compaction_free+0x10/0x10
  ? __pfx_migrate_pages+0x10/0x10
  ? trace_irq_enable.constprop.0+0xce/0x110
  ? rcu_is_watching+0x12/0xc0
  ? isolate_migratepages_block+0x2b02/0x4560
  ? __pfx_isolate_migratepages_block+0x10/0x10
  ? __pfx___might_resched+0x10/0x10
  compact_zone+0x1a7c/0x3860
  ? rcu_is_watching+0x12/0xc0
  ? __pfx___free_object+0x10/0x10
  ? __pfx_compact_zone+0x10/0x10
  ? rcu_is_watching+0x12/0xc0
  ? lock_acquire+0x457/0x540
  ? kcompactd+0x2fa/0xc70
  ? rcu_is_watching+0x12/0xc0
  compact_node+0x144/0x240
  ? __pfx_compact_node+0x10/0x10
  ? rcu_is_watching+0x12/0xc0
  kcompactd+0x686/0xc70
  ? __pfx_kcompactd+0x10/0x10
  ? __pfx_autoremove_wake_function+0x10/0x10
  ? __kthread_parkme+0xb1/0x1d0
  ? __pfx_kcompactd+0x10/0x10
  ? __pfx_kcompactd+0x10/0x10
  kthread+0x2d2/0x3a0
  ? _raw_spin_unlock_irq+0x28/0x60
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>


--
Thanks,

David / dhildenb