On 11/08/21 at 06:31pm, Andrew Morton wrote: > From: David Hildenbrand <david@xxxxxxxxxx> > Subject: proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks > > Let's support multiple registered callbacks, making sure that registering > vmcore callbacks cannot fail. Make the callback return a bool instead of > an int, handling how to deal with errors internally. Drop unused > HAVE_OLDMEM_PFN_IS_RAM. > > We soon want to make use of this infrastructure from other drivers: > virtio-mem, registering one callback for each virtio-mem device, to > prevent reading unplugged virtio-mem memory. > > Handle it via a generic vmcore_cb structure, prepared for future > extensions: for example, once we support virtio-mem on s390x where the > vmcore is completely constructed in the second kernel, we want to detect > and add plugged virtio-mem memory ranges to the vmcore in order for them > to get dumped properly. > > Handle corner cases that are unexpected and shouldn't happen in sane > setups: registering a callback after the vmcore has already been opened > (warn only) and unregistering a callback after the vmcore has already been > opened (warn and essentially read only zeroes from that point on). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I am fine with the whole patch except of one concern. As above sentence underscored states, if a callback is unregistered when vmcore has been opened, it will read out zeros from that point on. And it's done by judging global variable 'vmcore_cb_unstable' in pfn_is_ram(). This will cause vmcore dumping in makedumpfile only being able to read out zero page since then, and may cost long extra time to finish. Please see remap_oldmem_pfn_checked(). In makedumpfile, we default to mmap 4M memory region at one time, then copy out. With this patch, and if vmcore_cb_unstable is true, kernel will mmap page by page. The extra time could be huge, e.g on machine with TBs memory, and we only get a useless vmcore because of loss of core data with high probability. I am thinking if we can simply panic in the case, since the left dumping are all zeroed, very likely the vmcore is unavailable any more. ...... > static bool pfn_is_ram(unsigned long pfn) > { > - int (*fn)(unsigned long pfn); > - /* pfn is ram unless fn() checks pagetype */ > + struct vmcore_cb *cb; > bool ret = true; > > - /* > - * Ask hypervisor if the pfn is really ram. > - * A ballooned page contains no data and reading from such a page > - * will cause high load in the hypervisor. > - */ > - fn = oldmem_pfn_is_ram; > - if (fn) > - ret = !!fn(pfn); > + lockdep_assert_held_read(&vmcore_cb_rwsem); > + if (unlikely(vmcore_cb_unstable)) > + return false; > + > + list_for_each_entry(cb, &vmcore_cb_list, next) { > + if (unlikely(!cb->pfn_is_ram)) > + continue; > + ret = cb->pfn_is_ram(cb, pfn); > + if (!ret) > + break; > + } > > return ret; > } > ......