Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

On 3/27/20 5:06 PM, David Hildenbrand wrote:
> On 27.03.20 17:56, James Morse wrote:
>> On 3/27/20 9:30 AM, David Hildenbrand wrote:
>>> On 26.03.20 19:07, James Morse wrote:
>>>> An image loaded for kexec is not stored in place, instead its segments
>>>> are scattered through memory, and are re-assembled when needed. In the
>>>> meantime, the target memory may have been removed.
>>>>
>>>> Because mm is not aware that this memory is still in use, it allows it
>>>> to be removed.
>>>>
>>>> Add a memory notifier to prevent the removal of memory regions that
>>>> overlap with a loaded kexec image segment. e.g., when triggered from the
>>>> Qemu console:
>>>> | kexec_core: memory region in use
>>>> | memory memory32: Offline failed.

>>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>>> index c19c0dad1ebe..ba1d91e868ca 100644
>>>> --- a/kernel/kexec_core.c
>>>> +++ b/kernel/kexec_core.c
>>
>>> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
>>>
>>> "SetPageReserved(pages + i);"
>>>
>>> Pages that are reserved cannot get offlined. How are you able to trigger
>>> that before this patch? (where is the allocation path for kexec, which
>>> will not set the pages reserved?)
>>
>> This sets page reserved on the memory it gets back from
>> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].
>>
>> The problem I see is for the target or destination memory once you execute the
>> image. Once machine_kexec() runs, it tries to write to this, assuming it is
>> still present...

> Let's recap
> 
> 1. You load the image. You allocate memory for e.g., the kexec kernel.
> The pages will be marked PG_reserved, so they cannot be offlined.
> 
> 2. You do the kexec. The kexec kernel will only operate on a reserved
> memory region (reserved via e.g., kernel cmdline crashkernel=128M).

I think you are merging the kexec and kdump behaviours.
(Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image')

For kdump, yes, the new kernel is loaded into the crashkernel reservation, and
confined to it.


For regular kexec, the new kernel can be loaded any where in memory. There might
be a difference with how this works on arm64....

The regular kexec kernel isn't stored in its final location when its loaded, its
relocated there when the image is executed. The target/destination memory may
have been removed in the meantime.

(an example recipe below should clarify this)


> Is it that in 2., the reserved memory region (for the crashkernel) could
> have been offlined in the meantime?

No, for kdump: the crashkernel reservation is PG_reserved, and its not something
mm knows how to move, so that region can't be taken offline.

(On arm64 we additionally prevent the boot-memory from being removed as it is
all described as present by UEFI. The crashkernel reservation would always be
from this type of memory)


This is about a regular kexec, any crashdump reservation is irrelevant.
This kexec kernel is temporarily stored out of line, then relocated when executed.

A recipe so that we're at least on the same terminal! This is on a TX2 running
arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre
config, its because Qemu supports hotremove on x86, but not yet on arm64).


Insert the memory:
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

| root@vm:~# free -m
|               total        used        free      shared  ...
| Mem:            918          52         814           0  ...
| Swap:             0           0           0


Bring it online:
| root@vm:~# cd /sys/devices/system/memory/
| root@vm:/sys/devices/system/memory# for F in memory3*; do echo \
| online_movable > $F/state; done

| Built 1 zonelists, mobility grouping on.  Total pages: 251049
| Policy zone: DMA32

| -bash: echo: write error: Invalid argument
| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:           1942          53        1836           0  ...
| Swap:             0           0           0


Load kexec:
| root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline

Press the Attention button to request removal:

(qemu) device_del dimm1

| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Built 1 zonelists, mobility grouping on.  Total pages: 233728
| Policy zone: DMA32

The memory is gone:
| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:            918          89         769           0  ...
| Swap:             0           0           0

Trigger kexec:
| root@vm:/sys/devices/system/memory# kexec -e

[...]

| sd 0:0:0:0: [sda] Synchronizing SCSI cache
| kexec_core: Starting new kernel

... and Qemu restarts the platform firmware instead of proceeding with kexec.
(I assume this is a triple fault)

You can use mem-min and mem-max to control where kexec's user space will place
the memory.


If you apply this patch, the above sequence will fail at the device remove step,
as the physical addresses match the loaded kexec image:

| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| kexec_core: Memory region in use
| kexec_core: Memory region in use
| memory memory39: Offline failed.
| Built 1 zonelists, mobility grouping on.  Total pages: 299212
| Policy zone: Normal

| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:           1942          90        1793           0    ...
| Swap:             0           0           0

I can't remove the DIMM, because we failed to offline it:

(qemu) object_del mem1
object 'mem1' is in use, can not be deleted

and I can trigger kexec and boot the new kernel.

kexec user-space here comes from debian bullseye. It picked the removable memory
all by itself without any additional arguments.


(a different issue that can be ignored for now: x86 additionally fails to reboot
if I remove memory, even if its not in use by the kexec image. This doesn't
cause qemu to reboot via firmware, I think it dies before the console. It
doesn't happen on arm64. I suspect the memory map is snapshotted and assumed to
still be correct when the image is executed.)


Thanks,

James




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux