Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image

David Hildenbrand <david@xxxxxxxxxx> · Mon, 30 Mar 2020 20:14:03 +0200

On 30.03.20 19:17, James Morse wrote:
> Hi David,
> 
> On 3/30/20 2:13 PM, David Hildenbrand wrote:
>>> Adding a sentence about the way kexec load works may help, the first paragraph
>>> would read:
>>>
>>> | Kexec allows user-space to specify the address that the kexec image should be
>>> | loaded to. Because this memory may be in use, an image loaded for kexec is not
>>> | stored in place, instead its segments are scattered through memory, and are
>>> | re-assembled when needed. In the meantime, the target memory may have been
>>> | removed.
>>>
>>> Do you think thats clearer?
>>
>> Yes, very much. Maybe add, that the target is described by user space
>> during kexec_load() and that user space - right now - parses /proc/iomem
>> to find applicable system memory.
> 
> (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that
> expanded commit message, once we've agreed this is the right thing to do!

Right, I can see kexec-tools parsing /sys/firmware/memmap first.
Unfortunately, all hotplugged memory (via add_memory()) is indicated
there as System RAM ... including memory added by virtio-mem.

I think we should adapt the type there as well. (in your patch #2)

	firmware_map_add_hotplug(start, start + size, "System RAM");

> 
> 
>>>> I wonder if we should instead make the "kexec -e" fail. It tries to
>>>> touch random system memory.
>>>
>>> Heh, isn't touching random system memory what kexec does?!
>>
>> Having a racy user interface that can trigger kernel crashes feels very
>> wrong. We should limit the impact.
> 
> 
>>> Its all described to user-space as 'System RAM'. Teaching it to probe
>>> /sys/devices/memory/... would require a user-space change.
>>
>> I think we should really rename hotplugged memory on all architectures.
>>
>> Especially also relevant for virtio-mem/hyper-v balloon, where some
>> pieces of (hotplugged )memory blocks are partially unavailable and
>> should not be touched - accessing them results in unpredictable behavior
>> (e.g., crashes or discarded writes).
> 
> I'll need to look into these. I'd assume for KVM that virtio-mem can be brought
> back when its accessed ... its just going to be slow.

Touching unplugged virtio-mem memory can result in unpredictable
behavior. Touching (some) unplugged Hyper-V memory will be handled
similarly AFAIK.

[...]

>> 1. It's racy. If memory is getting offlined/unplugged just while user
>> space is about to trigger the kexec_load(), you end up with the very
>> same triple-fault.
> 
> load? How is this different to user-space providing a bogus address?

I guess it's not different. It's just racy because user space with good
intend could crash the system :)

> 
> Sure, user-space may take a nap between parsing /proc/iomem and calling
> kexec_load(), but the kernel should reject these as they would never work.
> 
> (I can't see where sanity_check_segment_list() considers the platform's memory.
> If it doesn't, we should fix it)

Right, that's what I meant. I was not able to find any sanity checks.
Maybe they are in place but I was not able to spot them.

> 
> Once the image is loaded, and clashes with a request to remove the memory there
> are two choices: secretly unload the image, or prevent the memory being taken
> offline.

Exactly. Or make "kexec -e" fail.

> 
> 
>> 2. It's semantically wrong. kexec does not need online memory ("managed
>> by the buddy"), but still you disallow offlining memory.
> 
> It does need the memory if you want 'kexec -e' to succeed.
> If there were any sanity tests, they should have happened at load time.

Offlining != removing. That's the point I was trying to make. (and we
don't want to block removing of memory in the kernel any other way)

> 
> The memory is effectively in use by the loaded kexec image. User-space told the
> kernel to use this memory, you should not be able to then remove it, without
> unloading the kexec image first.

It's not in use before you do the "kexec -e" IMHO.

> Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up
> like this?

No, not at all. I think this should be fixed if this is possible.

> 
>> I would really much rather want to see user-space choosing boot memory
>> (e.g., renaming hotplugged memory on all architectures), and checking
>> during "kexec -e" if the selected memory is actually "there", before
>> trying to write to it.
> 
> How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass
> something between 'load' and 'exec'. How do you keep existing user-space working
> as much as possible?

If we use new types (e.g., "System RAM (hotplugged)"), looks like most
of kexec will continue working (memory will be treated like
RANGE_RESERVED or ignored).

I guess we would still have to teach kexec-tools the new types,
primarily to keep the crash memory ranges from getting detected
properly. (no idea how they are used, will have to take a closer look)

> 
> What do you do if the memory isn't there? User-space just called reboot(), it
> would be better to avoid getting into the situation where we have to fail that call.

In kernel_kexec() we already fail if there is no kernel image loaded, so
we can similarly simply fail if the kernel image cannot be moved to the
target memory IMHO.

-- 
Thanks,

David / dhildenb