[PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel

takahiro.akashi@xxxxxxxxxx (AKASHI Takahiro) · Mon, 23 Jan 2017 18:51:46 +0900

Mark,

On Thu, Jan 19, 2017 at 11:28:50AM +0000, Mark Rutland wrote:
> On Thu, Jan 19, 2017 at 06:49:42PM +0900, AKASHI Takahiro wrote:
> > On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
> > > On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:
> > > > On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:
> > > > > Great! I think it would be better to follow the approach of
> > > > > mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise,
> > > > > it looks like it should work.
> > > > 
> > > > I'm not quite sure what the approach of mark_rodata_ro() means, but
> > > > I found that using create_mapping_late() may cause two problems:
> > > > 
> > > > 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry.
> > > >    This can happen, say, if the memory range for crash dump kernel
> > > >    starts in the mid of _continuous_ pages.
> > > 
> > > That should only happen if we try to remap a segment different to what
> > > we originally mapped.
> > > 
> > > I was intending that we'd explicitly map the reserved region separately
> > > in the boot path, like we do for kernel segments in map_kernel(). We
> > > would allow sections and/or CONT entires. 
> > > 
> > > Then, in __map_memblock() we'd then skip that range as we do for the
> > > linear map alias of the kernel image.
> > > 
> > > That way, we can later use create_mapping_late for that same region, and
> > > it should handle sections and/or CONT entries in the exact same way as
> > > it does for the kernel image segments in mark_rodata_ro().
> > 
> > I see.
> > Which one do you prefer, yours above or my (second) solution?
> > Either way, they do almost the same thing in terms of mapping.
> 
> While both should work, I'd prefer to match the existing map_kernel()
> logic, (i.e. my suggestion above), for consistency.

OK

> > > I don't think we have much code useful for unmapping. We could re-use 
> > > create_mapping_late for this, passing a set of prot bits that means the
> > > entries are invalid (e.g. have a PAGE_KERNEL_INVALID).
> > 
> > Do you really think that we should totally invalidate mmu entries?
> > I guess that, given proper cache & TLB flush operations, RO attribute is
> > good enough for memory consistency, no?
> > (None accesses the region, as I said, except in the case of re-loading
> > crash dump kernel though.)
> 
> My worry is that the first kernel and kdump kernel may map (portions of)
> the region with potentially confliciting memory attributes. So it would
> be necessary to completely unmap the region.

I think that this can happen only if the second kernel boots up,
leaving non-crashed cpus still running for some reason.

> You raise a good point that this would also mean we need to perform some
> cache maintenance, which makes that a little more painful. We'd need a
> sequence like:
> 
> * Unmap the region
> * TLB invalidation
> * Remap the region with non-cacheable attributes
> * Cache maintenance
> * Unmap the region
> * TLB invalidation

I don't get why we need to remap the region and do cache
maintenance here. Please elaborate a bit more?
My current implementation of arch_kexec_protect_crashkres() is:

        kexec_segment_flush(kexec_crash_image);
        create_mapping_late(crashk_res.start, ..., __pgprot(0));
                                                or PAGE_KERNEL_INVALID
        flush_tlb_all();

kexec_segment_flush() will eventually do dcache-flush for all the modified
data in crash dump kernel memory.

> > > We'd have to perform the TLB invalidation ourselves, but that shouldn't
> > > be too painful.
> > 
> > Do we need to invalidate TLBs not only before but also after changing
> > permission attributes as make_rodata_ro() does?
> 
> I believe we'd only have to perform the TLB invalidation after the
> change of attributes.

OK

Thanks,
-Takahiro AKASHI

> Thanks,
> Mark.