Re: [PATH v4 1/2] arm64: fix kernel memory map handling for kaslr-enabled kernel

AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> · Fri, 3 Jun 2016 11:53:18 +0900

On Thu, Jun 02, 2016 at 11:37:24AM -0400, Dave Anderson wrote:
> ----- Original Message -----
> > On Wed, Jun 01, 2016 at 11:41:04AM -0400, Dave Anderson wrote:
> > > 
> > > ----- Original Message -----
> > > 
> > > > > >  
> > > > > >  	case PRE_GDB:
> > > > > > +		/* This check is somewhat redundant */
> > > > > > +		if (kernel_symbol_exists("kimage_voffset"))
> > > > > > +			machdep->flags |= NEW_VMEMMAP;
> > > > > > +
> > > > > 
> > > > > Certainly not redundant on a live system without CONFIG_RANDOMIZE_BASE.
> > > > 
> > > > Yeah, but for crashdump case, it's redundant.
> > > 
> > > I understand, but if you're going to add a comment, it should clarify
> > > rather than mystify...   ;-)
> > 
> > OK, how about this:
> > "Check for NEW_VMEMMAP again for a live system"
> > 
> > Or just remove it?
> 
> Sure, that's better.
>  
> > > 
> > > > > >  		arm64_calc_VA_BITS();
> > > > > > -		machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> > > > > > +		ms = machdep->machspec;
> > > > > > +		ms->page_offset = ARM64_PAGE_OFFSET;
> > > > > > +		/* FIXME: idmap for NEW_VMEMMAP */
> > > > > 
> > > > > What's the FIXME here?
> > > > 
> > > > Identity-mapping no longer starts from PAGE_OFFSET if KASLR.
> > > > Since machdep->identity_map_base is not used anywhere in crash,
> > > > I wonder how I should handle it.
> > > > 
> > > > > >  		machdep->identity_map_base = ARM64_PAGE_OFFSET;
> > > 
> > > You're right, machdep->identity_map_base is (currently) only used in
> > > architecture-specific files, although arm64.c does not use it.
> > > But what you have done above is correct.
> > 
> > So should I remove the comment?
> 
> Right -- there's nothing to fix!
> 
> > > 
> > > > > > @@ -631,6 +718,19 @@ arm64_calc_phys_offset(void)
> > > > > >  	if (machdep->flags & PHYS_OFFSET) /* --machdep override */
> > > > > >  		return;
> > > > > >  
> > > > > > +	if (machdep->flags & NEW_VMEMMAP) {
> > > > > > +		struct syment *sp;
> > > > > > +		ulong value;
> > > > > > +
> > > > > > +		sp = kernel_symbol_search("memstart_addr");
> > > > > > +		if (sp && readmem(sp->value, KVADDR, (char *)&value,
> > > > > > +				sizeof(value), "memstart_addr",
> > > > > > +				QUIET|RETURN_ON_ERROR)) {
> > > > > > +			ms->phys_offset = value;
> > > > > > +			return;
> > > > > > +		}
> > > > > > +	}
> > > > > > +
> > > > > 
> > > > > As we've discussed before, I cannot accept the chicken-and-egg snippet
> > > > > above.
> > > > > 
> > > > > If machdep->machspec->kimage_voffset has not been determined, then the > > > > arm64_VTOP()
> > > > > call made by readmem() will utilize machdep->machspec->phys_offset -- > > > > which is what
> > > > > you're trying to initialize here.
> > > > 
> > > > Right, I was a bit confused because my patch basically had an assumption
> > > > that we have some sort of way to know kimage_voffset.
> > > > 
> > > > Meanwhile, if this hunk be moved after "if ACTIVE())" like,
> > > >     ms->phys_offset = 0;
> > > >     if (ACTIVE()) {
> > > >         read("/proc/iomem");
> > > >         ...
> > > >     } else {
> > > >         if (machdep->flags & NEW_VMEMMAP) {
> > > >             readmem("memstart_addr")
> > > >             ...
> > > >         } else if (DISKDUMP_DUMPFILE() ...) {
> > > >         } else if (KDUMP_DUMPFILE() ...) {
> > > >         }
> > > >     }
> > > > the crash command would still work for a live system if !KASLR.
> > > 
> > > Again, the readmem() is not necessary.
> > > 
> > > > 
> > > > > On my live system that has a phys_offset of 0x40000000000, the readmem()
> > > > > will fail quietly, but that's not a acceptable usage of the RETURN_ON_ERROR
> > > > > failure mode, because "memstart_addr" is a legitimate virtual address that
> > > > > should never fail.  Also, because the actual kernel phys_offset can be
> > > > > negative, it would seem to be entirely within the realm of possibility that
> > > > > the readmem() could mistakenly return successfully, but have read the wrong
> > > > > location.
> > > > > 
> > > > > So what it boils down to is that readmem() should NEVER be called until ALL of the
> > > > > pieces required by arm64_VTOP() have all been properly initialized, however
> > > > > that might be accomplished.  Not to mention that calling it this early sets  a
> > > > > dangerous precedent.
> > > > 
> > > > If you think so, we can add a specialized function like:
> > > >     arm64_readmem_kernel_symbol(char *, ...)
> > > 
> > > I don't think so -- the point is to avoid making a readmem() call this early.
> > 
> > I think that I fully understand your concerns here, but my point is:
> > - I give you a value of kimage_voffset in VMCOREINFO
> > - you know vmlinux and have vmcore
> > - you can and do determine kaslr_offset
> > - so why don't you identify PHYS_OFFSET by reading "memstart_addr"?
> 
> The VMCOREINFO is primarily there for the user-space kdump and makedumpfile facilities.
> Crash does check it occasionally, but can't depend upon it, because there are too many other
> dumpfile formats.

That is the reason that I insist on minimizing the number of VMCOREINFO
parameters. Adding redundant ones can cause confusion in the future.

> > 
> > Using readmem() or not is a matter of implementation.
> > I think that we can use READMEM(), instead of readmem(), in
> > arm64_readme_kernel_symbol(), which is solely used, at least at the moment,
> > in arm64_calc_phys_offset().
> 
> I'm sure I don't understand.  Calling READMEM() as opposed to readmem()
> would require the virtual-to-physical address translation to be done
> on the "memstart_addr" symbol.

Right, but as I said,
- we've already calculated kaslr_offset with derive_kaslr_offset() called by
  symtab_init(), which is prior to machdep_init(PRE_GDB).
- we can determine the *real(runtime)* virtual address of "memstart_addr"
  by
       <vaddr> = <vaddr of memstart_addr in vmlinux> + kaslr_offset
- then we will identify the physical address by
       <paddr> = <vaddr> - <kimage_voffset>
- so why cannot we call READMEM(pc->readmem) here(arm64_calc_phys_offset)
  at the end of machdep_init(PRE_GDB)?
  pc->readmem will be initialized definitely before any machdep_init(*).

See what I mean?
Logically, I don't see any breakage of existing APIs/assumptions.
(So I said, it is a matter of implementation.)

> Anyway, READMEM should be hidden and
> constrained to readmem() itself.  (It is used in one other odd-ball place
> prior to pc->readmem being initialized and a System.map file being used
> as the correct source of kernel symbol values.)
> 
> > Well, we may have a different story for makedumpfile as there seems to be
> > an assumption that we have only a vmcore file, neither vmlinux nor System.map.
> > So I suggested to Pratyush that he could post a patch later to add any
> > VMCOREINFO parameters that you wanted for makedumpfile.
> 
> Right, there are going to be kdump/makedumpfile requirements, one of them being
> the capability of setting the phys_base value in the kdump_sub_header.
>  
> > > > > And in the case of kdump's ELF vmcore and compressed vmcore formats, there
> > > > > is an existing API between kdump and the crash utility to pass back the
> > > > > phys_base.  In the kexec-tool's makedumpfile.c file, there is the
> > > > > get_phys_base_arm64() function that
> > > > 
> > > > I can't find makedumpfile.c in kexec-tools.
> > >  
> > > Hold on, clarification below...
> > > 
> > > > 
> > > > > currently calculates the offset by using the PT_LOAD segments, and presumably will
> > > > > have to be modified to use new VMCOREINFO data.  But regardless of how it's one,
> > > > > the architecture-neutral write_kdump_header() laster copies that offset value into
> > > > > the kdump_sub_header.phys_base field for the crash utility to access.  Trying to do
> > > > > a readmem() this early in time is essentially breaking that API.
> > > > 
> > > > You may already notice that "phys_offset" may have two different meanings:
> > > >   * the start address of System RAM
> > > >     (PHYS_OFFSET on v4.5 or early, and __PHYS_OFFSET on v4.6 or later)
> > > >   * the value used in phys_to_virt()/virt_to_phys() for any address
> > > >     in linear-mapping
> > > >     (PHYS_OFFSET on v4.5 or early, and PHYS_OFFSET on v4.6 or later)
> > > > 
> > > > I think that current crash utility needs only the latter.
> > > > Please correct me if I misunderstand something.
> > > 
> > > Right, the __PHYS_OFFSET value seems to be the kernel entry point address,
> > > but that is not used.  Crash needs the PHYS_OFFSET value to mimic the kernel's
> > > arm64 __virt_to_phys() function.
> > > 
> > > Sorry about my reference to "kexec-tool's makedumpfile.c file", as I was
> > > talking about the Red Hat kexec-tools package, which also contains
> > > the makedumpfile facility.
> > 
> > I see.
> > 
> > > Makedumpfile is upstream in a couple places, where the most recently
> > > released
> > > version (1.5.9) is here:
> > > 
> > >   https://sourceforge.net/projects/makedumpfile/files/makedumpfile/
> > > 
> > > Or the development repository is here:
> > > 
> > >   git://git.code.sf.net/p/makedumpfile/code
> > > 
> > > Regardless of which location, there are the generic get_phys_base() macros
> > > here in "makedumpfile.h" for those architectures that need it:
> > > 
> > >   makedumpfile.h 814 #define get_phys_base() get_phys_base_arm64()
> > >   makedumpfile.h 826 #define get_phys_base() get_phys_base_arm()
> > >   makedumpfile.h 837 #define get_phys_base() stub_true()
> > >   makedumpfile.h 850 #define get_phys_base() get_phys_base_x86_64()
> > >   makedumpfile.h 861 #define get_phys_base() stub_true()
> > >   makedumpfile.h 871 #define get_phys_base() stub_true()
> > >   makedumpfile.h 882 #define get_phys_base() stub_true()
> > >   makedumpfile.h 894 #define get_phys_base() get_phys_base_ia64()
> > > 
> > > And where the arm64-specific function is in "arch/arm64.c":
> > >   
> > >   int
> > >   get_phys_base_arm64(void)
> > >   {
> > >           unsigned long phys_base = ULONG_MAX;
> > >           unsigned long long phys_start;
> > >           int i;
> > >           /*
> > >            * We resolve phys_base from PT_LOAD segments. LMA contains
> > >            physical
> > >            * address of the segment, and we use the lowest start as
> > >            * phys_base.
> > >            */
> > >           for (i = 0; get_pt_load(i, &phys_start, NULL, NULL, NULL); i++) {
> > >                   if (phys_start < phys_base)
> > >                           phys_base = phys_start;
> > >           }
> > >   
> > >           if (phys_base == ULONG_MAX) {
> > >                   ERRMSG("Can't determine phys_base\n");
> > >                   return FALSE;
> > >           }
> > >   
> > >           info->phys_base = phys_base;
> > >   
> > >           DEBUG_MSG("phys_base    : %lx\n", phys_base);
> > >   
> > >           return TRUE;
> > >   }
> > >   
> > > Note that it currently looks for the lowest physical address that is
> > > referenced in any of the PT_LOAD segments of /proc/vmcore.  As you have
> > > discussed for Linux 4.6, perhaps it will be necessary that the value
> > > of PHYS_OFFSET ("memstart_addr") will have to be passed in a VMCOREINFO
> > > item.  It's not clear to me why PT_LOAD p_paddr values will no longer
> > > be usable, but that's up to the makedumpfile/kdump maintainers.  And
> > > certainly a convenient VMCOREINFO item would be ideal/preferable.
> > > 
> > > Anyway, later on, the info->phys_base value gets stored in the compressed
> > > kdump header in the write_kdump_header() function in "makedumpfile.c":
> > >   
> > >   int
> > >   write_kdump_header(void)
> > >   {
> > >           int ret = FALSE;
> > >           size_t size;
> > >           off_t offset_note, offset_vmcoreinfo;
> > >           unsigned long size_note, size_vmcoreinfo;
> > >           struct disk_dump_header *dh = info->dump_header;
> > >           struct kdump_sub_header kh;
> > >           char *buf = NULL;
> > >   
> > >   ... [ cut ] ...
> > >   
> > >           /*
> > >            * Write sub header
> > >            */
> > >           size = sizeof(struct kdump_sub_header);
> > >           memset(&kh, 0, size);
> > >           /* 64bit max_mapnr_64 */
> > >           kh.max_mapnr_64 = info->max_mapnr;
> > >           kh.phys_base  = info->phys_base;
> > >           kh.dump_level = info->dump_level;
> > >   ...
> > >   
> > > And then in the crash utility source code, it simply gets accessed
> > > like so:
> > >   
> > >   int
> > >   diskdump_phys_base(unsigned long *phys_base)
> > >   {
> > >           if (KDUMP_CMPRS_VALID()) {
> > >                   *phys_base = dd->sub_header_kdump->phys_base;
> > >                   return TRUE;
> > >           }
> > >   
> > >           return FALSE;
> > >   }
> > >   
> > > Now, for ELF vmcores, the crash utility calls arm64_kdump_phys_base()
> > > which calls arm_kdump_phys_base(), which simply mimics what the
> > > makedumpfile.c's get_phys_base_arm64() does.  And that function may
> > > need to be changed for Linux 4.6 in order to access the new VMCOREINFO
> > > variable.
> > 
> > It sounds to me that, once VOMCOREINFO("PHYS_OFFSET") is added for
> > makedumpfile, crash utility wants to use it, too. Right?
> 
> It won't have to -- it has always used diskdump_phys_base().  Now, for
> kdump ELF vmcores, yes, it could use pc->read_vmcoreinfo().
> 
> > 
> > If so, what about, say, VA_BITS which will presumably be needed as
> > VMCOREINFO later for makedumpfile but is calculated for now in
> > crash utility?
> 
> Makedumpfile may require VA_BITS, but it's pretty straight forward in crash
> because we have the symbol values from the vmlinux file.  That being
> the case, if it's available in VMCOREINFO, it could be used as an
> alternative for compressed kdumps or kdump ELF vmcores.

I know another example: VMCOREINFO("KERNELOFFSET"), which is actually
equal to "kaslr_offset".
It exists on x86, but is not utilized by crash util.

> > 
> > Please don't take me wrong. I, as an author of kdump for arm64, just want
> > to understand what parameters and why we want to have.
> 
> Right, although it's going to be kind of play-as-we-go.  The crash utility is
> going to require kdump-ELF and makedumpfile to offer the the things that 
> it needs, but kdump-ELF/makedumpfile may have its own requirements to generate
> the things that the crash utility needs.  As far as I can tell, at a minimum,
> VA_BITS, kimage_voffset and PHYS_BASE would be required.

I'm not quite sure, but do you, as a maintainer of crash, expect those
parameters to be also appended for crash?

Thanks,
-Takahiro AKASHI

> Thanks,
>   Dave
> 
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility

-- 
Thanks,
-Takahiro AKASHI

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility