On Tue 24-07-18 14:17:12, David Hildenbrand wrote: > On 24.07.2018 09:25, Michal Hocko wrote: > > On Mon 23-07-18 19:20:43, David Hildenbrand wrote: > >> On 23.07.2018 14:30, Michal Hocko wrote: > >>> On Mon 23-07-18 13:45:18, Vlastimil Babka wrote: > >>>> On 07/20/2018 02:34 PM, David Hildenbrand wrote: > >>>>> Dumping tools (like makedumpfile) right now don't exclude reserved pages. > >>>>> So reserved pages might be access by dump tools although nobody except > >>>>> the owner should touch them. > >>>> > >>>> Are you sure about that? Or maybe I understand wrong. Maybe it changed > >>>> recently, but IIRC pages that are backing memmap (struct pages) are also > >>>> PG_reserved. And you definitely do want those in the dump. > >>> > >>> You are right. reserve_bootmem_region will make all early bootmem > >>> allocations (including those backing memmaps) PageReserved. I have asked > >>> several times but I haven't seen a satisfactory answer yet. Why do we > >>> even care for kdump about those. If they are reserved the nobody should > >>> really look at those specific struct pages and manipulate them. Kdump > >>> tools are using a kernel interface to read the content. If the specific > >>> content is backed by a non-existing memory then they should simply not > >>> return anything. > >>> > >> > >> "new kernel" provides an interface to read memory from "old kernel". > >> > >> The new kernel has no idea about > >> - which memory was added/online in the old kernel > >> - where struct pages of the old kernel are and what their content is > >> - which memory is save to touch and which not > >> > >> Dump tools figure all that out by interpreting the VMCORE. They e.g. > >> identify "struct pages" and see if they should be dumped. The "new > >> kernel" only allows to read that memory. It cannot hinder to crash the > >> system (e.g. if a dump tool would try to read a hwpoison page). > >> > >> So how should the "new kernel" know if a page can be touched or not? > > > > I am sorry I am not familiar with kdump much. But from what I remember > > it reads from /proc/vmcore and implementation of this interface should > > simply return EINVAL or alike when you try to dump inaccessible memory > > range. > > Oh, and BTW, while something like -EINVAL could work, we usually don't > want to try to read certain pages at all (e.g. ballooned pages - > accessing the page might work but involves quite some overhead in the > hypervisor). > > So we should either handle this in dump tools (reserved + ...?) or while > doing the read similar to XEN (is_ram_page()). Yes, I think this is the proper way. Just test for PageOnline in read_from_oldmem/copy_oldmem_page. Btw. we already page pfn_to_online_page which performs the per-section online/offline status. This should be extendable to consider your new PageOffline state. > I wonder if we could convert the early allocated memory (PG_reserved) at > some point (buddy initialized) into ordinary "simply allocated" memory. I do not think so. There is good reason why we keep them reserved. There are many pfn walkers that simply shouldn't touch those pages. Maybe we can achieve a page reserve type for all usages but that will be a larger project I am afraid. -- Michal Hocko SUSE Labs