On 26.07.2018 10:27, Michal Hocko wrote: > On Wed 25-07-18 16:20:41, David Hildenbrand wrote: >> On 25.07.2018 15:51, Michal Hocko wrote: >>> On Tue 24-07-18 16:13:09, David Hildenbrand wrote: >>> [...] >>>> So I see right now: >>>> >>>> - Pg_reserved + e.g. new page type (or some other unique identifier in >>>> combination with Pg_reserved) >>>> -> Avoid reads of pages we know are offline >>>> - extend is_ram_page() >>>> -> Fake zero memory for pages we know are offline >>>> >>>> Or even both (avoid reading and don't crash the kernel if it is being done). >>> >>> I really fail to see how that can work without kernel being aware of >>> PageOffline. What will/should happen if you run an old kdump tool on a >>> kernel with this partially offline memory? >>> >> >> New kernel with old dump tool: >> >> a) we have not fixed up is_ram_page() >> >> -> crash, as we access memory we shouldn't > > this is not acceptable, right? You do not want to crash your crash > kernel ;) Well, the same can happen today with PageHWPoison. The "new" kernel will happily access such pages and crash as far as I understand (it has has no idea of the old struct pages). Of course, this is "less likely" than what I describe. > >> b) we have fixed up is_ram_page() >> >> -> We have a callback to check for applicable memory in the hypervisor >> whether the parts are accessible / online or not accessible / offline. >> (e.g. via a device driver that controls a certain memory region) >> >> -> Don't read, but fake a page full of 0 >> >> >> So instead of the kernel being aware of it, it asks via is_ram_page() >> the hypervisor. > > I am still confused why do we even care about hypervisor. What if > somebody wants to have partial memory hotplug on native OS? Good point I was ignoring so far (too much focusing on my use case I assume). So for these, we would have to catch illegal accesses and a) report them (-EINVAL / - EIO) as you said b) fake a zero page I assume catching illegal accesses should be possible. Might require some work across all architectures. Still, dump tools should in addition not even try to read if possible. > >> I don't think a) is a problem. AFAICS, we have to update makedumpfile >> for every new kernel. We can perform changes and update makedumpfile >> to be compatible with new dump tools. > > Not really. You simply do not crash the kernel just because you are > trying to dump the already crashed kernel. > >> E.g. remember SECTION_IS_ONLINE you introduced ? It broke dump >> tools and required > > But has it crashed the kernel when reading the dump? If yes then the > whole dumping is fragile as hell... No, I think it simply didn't work. At least that's what I assume ;) I was rather saying that dump tools may have to be fixed up to work with a new kernel. -- Thanks, David / dhildenb