Re: [patch 08/87] proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks

David Hildenbrand <david@xxxxxxxxxx> · Wed, 10 Nov 2021 09:10:22 +0100

On 10.11.21 08:22, Baoquan He wrote:
> On 11/08/21 at 06:31pm, Andrew Morton wrote:
>> From: David Hildenbrand <david@xxxxxxxxxx>
>> Subject: proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks
>>
>> Let's support multiple registered callbacks, making sure that registering
>> vmcore callbacks cannot fail.  Make the callback return a bool instead of
>> an int, handling how to deal with errors internally.  Drop unused
>> HAVE_OLDMEM_PFN_IS_RAM.
>>
>> We soon want to make use of this infrastructure from other drivers:
>> virtio-mem, registering one callback for each virtio-mem device, to
>> prevent reading unplugged virtio-mem memory.
>>
>> Handle it via a generic vmcore_cb structure, prepared for future
>> extensions: for example, once we support virtio-mem on s390x where the
>> vmcore is completely constructed in the second kernel, we want to detect
>> and add plugged virtio-mem memory ranges to the vmcore in order for them
>> to get dumped properly.
>>
>> Handle corner cases that are unexpected and shouldn't happen in sane
>> setups: registering a callback after the vmcore has already been opened
>> (warn only) and unregistering a callback after the vmcore has already been
>> opened (warn and essentially read only zeroes from that point on).
>                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> I am fine with the whole patch except of one concern. As above sentence
> underscored states, if a callback is unregistered when vmcore has been
> opened, it will read out zeros from that point on. And it's done by
> judging global variable 'vmcore_cb_unstable' in pfn_is_ram(). This will
> cause vmcore dumping in makedumpfile only being able to read out zero
> page since then, and may cost long extra time to finish.
> 
> Please see remap_oldmem_pfn_checked(). In makedumpfile, we default to
> mmap 4M memory region at one time, then copy out. With this patch, and if
> vmcore_cb_unstable is true, kernel will mmap page by page. The extra
> time could be huge, e.g on machine with TBs memory, and we only get a
> useless vmcore because of loss of core data with high probability.

Thanks Baoquan for the quick review!

This code is really just to handle the unlikely case of a driver getting
unbound from a device that has a callback registered (e.g., a
virtio-mem-pci device). Something like this will never happen in
practice in a *sane* environment.

The only known way I know is if userspace manually unbinds the driver
from a virtio-mem-pci device -- which is possible but especially in a
kdump environment something without any sane use case. In that case, we'll

pr_warn_once("Unexpected vmcore callback unregistration\n");

to let user space know that something weird/unsupported is going on.

Long story short: if user space does something nasty, I don't see a
problem in some action taking a little longer.

> 
> I am thinking if we can simply panic in the case, since the left dumping
> are all zeroed, very likely the vmcore is unavailable any more.

IMHO panic() is a little bit too much. Instead of returning zeroes, we
could fail the read/mmap operation -- I considered that as an option
when I crafted/tested this patch, however, this approach here turned out
to be the easiest way to handle something that's really not
supported/advised and won't really happen in a sane environment.

-- 
Thanks,

David / dhildenb