Re: [BUG ?] Offline Memory gets stuck in offline_pages()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.07.24 03:25, Zhijian Li (Fujitsu) wrote:
Hi all


Overview:
During testing the CXL memory hotremove, we noticed that `daxctl offline-memory dax0.0`
would get stuck forever sometimes. daxctl offline-memory dax0.0 will write "offline" to
/sys/devices/system/memory/memoryNNN/state.

Hi,

See

Documentation/admin-guide/mm/memory-hotplug.rst

"
Further, when running into out of memory situations while migrating pages, or when still encountering permanently unmovable pages within ZONE_MOVABLE (-> BUG), memory offlining will keep retrying until it eventually succeeds.

When offlining is triggered from user space, the offlining context can be terminated by sending a signal. A timeout based offlining can easily be implemented via::

	% timeout $TIMEOUT offline_block | failure_handling
"


Workaround:
When it happens, we can type Ctrl-C to abort it and then retry again.
Then the CXL memory is able to offline successfully.

Where the kernel gets stuck:
After digging into the kernel, we found that when the issue occurs, the kernel
is stuck in the outer loop of offline_pages(). Below is a piece of the
highlighted offline_pages():

```
int __ref offline_pages()
{
    do { // outer loop
      pfn = start_pfn;
      do {
        ret = scan_movable_pages(pfn, end_pfn, &pfn);  // It returns -ENOENT
        if (!ret)
           do_migrate_range(pfn, end_pfn);             // Not reach here
      } while (!ret);
      ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
      } while (ret);                                   // ret is -EBUSY
}
```

In this case, we dumped the first page that cannot be isolated (see dump_page below), it's
content does not change in each iteration.:
```
Jun 28 15:29:26 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd
Jun 28 15:29:26 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff)
Jun 28 15:29:26 linux kernel: raw: 009fffffc0000000 ffffdfbd9e603788 ffffd4f0ffd97ef0 0000000000000000
Jun 28 15:29:26 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
Jun 28 15:29:26 linux kernel: page dumped because: trouble page...

Are you sure that's the problematic page?

refcount:0

Indicates that the page is free. But maybe it does not have PageBuddy() set.

It could also be that this is a "tail" page of a PageBuddy() page, and somehow we always end up on the tail in test_pages_isolated().

Which kernel + architecture are you testing with?

```

Every time the issue occurs, the content of the page structure is similar.

Questions:
Q1. Is this behavior expected? At least for an OS administrator, it should return
      promptly (success or failure) instead of hanging indefinitely.

It's expected that it might take a long time (possibly forever) in corner cases. See documentation.

But it's likely unexpected that we have some problematic page here.

Q2. Regarding the offline_pages() function, encountering such a page indeed causes
      an endless loop. Shouldn't another part of the kernel timely changed the state
      of this page?

There are various things that can go wrong. One issue might be that we try migrating a page but continuously fail to allocate memory to be used as a migration target. It seems unlikely with the page you dumped above, though.

Do you maybe have that CXL memory be on a separate "fake" NUMA node, and your workload mbind() itself to that NUMA node, possibly refusing to migrate somewhere else?


      When I use the workaround mentioned above (Ctrl-C and try offline again), I find
      that the page state changes (see dump_page below):
```
Jun 28 15:33:12 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd
Jun 28 15:33:12 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff)
Jun 28 15:33:12 linux kernel: raw: 009fffffc0000000 dead000000000100 dead000000000122 0000000000000000
Jun 28 15:33:12 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
Jun 28 15:33:12 linux kernel: page dumped because: previous trouble page
```

What our test does:
We have a CXL memory device, which is configured as kmem and online into the MOVABLE
zone as NUMA node2. We run two processes, consume-memory and offline-memory, in parallel,
see the pseudo code below:

```
main()
{
      if (fork() == 0)
          numactl -m 2 ./consume-memory

What exactly does "consume-memory" do? Does it involve hugetlb maybe?


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux