On Wed, 2009-01-07 at 08:57 -0600, Milton Miller wrote: > [removed Paul from cc and fixed Mohan's email] > > On Jan 6, 2009, at 5:44 PM, Michael Ellerman wrote: > > > On Fri, 2009-01-02 at 14:46 -0600, Milton Miller wrote: > >> @@ -94,10 +95,35 @@ void __init reserve_crashkernel(void) > >> KDUMP_KERNELBASE); > >> > >> crashk_res.start = KDUMP_KERNELBASE; > >> +#else > >> + if (!crashk_res.start) { > >> + /* > >> + * unspecified address, choose a region of specified size > >> + * can overlap with initrd (ignoring corruption when retained) > >> + * ppc64 requires kernel and some stacks to be in first segemnt > >> + */ > >> + crashk_res.start = KDUMP_KERNELBASE; > >> + } > >> + > >> + crash_base = PAGE_ALIGN(crashk_res.start); > >> + if (crash_base != crashk_res.start) { > >> + printk("Crash kernel base must be aligned to 0x%lx\n", > >> + PAGE_SIZE); > >> + crashk_res.start = crash_base; > >> + } > >> + > >> #endif > >> crash_size = PAGE_ALIGN(crash_size); > >> crashk_res.end = crashk_res.start + crash_size - 1; > >> > >> + /* The crash region must not overlap the current kernel */ > >> + if (overlaps_crashkernel(__pa(_stext), _end - _stext)) { > >> + printk(KERN_WARNING > >> + "Crash kernel can not overlap current kernel\n"); > >> + crashk_res.start = crashk_res.end = 0; > >> + return; > >> + } > > > > I think we can be smarter here. Why don't we adjust the crash kernel > > region so that it doesn't overlap the first kernel? ie. move it up a > > bit. > > How much? In addition to the size of the kernel, we have to allocate > (1) the emergeency stacks as we use them to bring up secondary cpus (2) > the irq stacks in the first segment. While the second could be met > easier on systems with 1TB slbs we don't take advantage of that yet. Hmm, we could try and work it out though. I guess we don't know how many CPUs we have at that point, which makes it a little trickier. So we have the emergency stack and the hard & soft irq stacks per cpu, which is 48KB AFAICT. So for a 256-way system that would be 12MB. I don't think I've seen an RMO smaller than 128MB, though I notice our RPA note specifies 64M as the minimum we'll accept. That would probably be a bit tight. How about something like: min_space = _end + 16MB (16 to be safe?) if min_space < rmo_size / 2: min_space = rmo_size / 2 if crash_base < min_space: crash_base = min_space > > There's also the issue of the RMO, I'm not sure what we should do > > there, > > but I think the kernel needs some smarts otherwise users are going to > > shoot themselves in the foot. > > I was looking at the code in kexec-tools for the rmo, and it seems > extremely broken (ie it sets rmo_top on every memory block instead of > the lowest; the clamp to 768M is the savior for systems with multiple > blocks). Oh surprise. > Do we care about loading a kernel below a relocated kernel (between the > interrupt vectors and the new kernel)? I ignored that for now, > arguing that we always run the first kernel at 0. No I don't think so. > > We could ignore the @x setting and split the RMO between both kernels > > somewhat intelligently. > > > > What might work is multiple crash regions, that way we could have some > > space in the RMO for the second kernel (say 32MB?), but the rest > > outside > > - leaving some RMO for the first kernel. But I think that would require > > some serious surgery. > > > > Other archs have this, i guess because they read the memory out of > /proc/iomem. The trick is knowing what has to be put in real space > and what can go abvoe the rmo. Also, we have those horrible hard-code > rmo to 768M max because some platform (one of the cell ones?) didn't > make the device tree to show it. Maybe we can track it down and add > linux,usable-mem-ranges to fix it up? Dunno about the cell, but some of the early blades did have crufty firmware. > Does the generic code support loading into the split regions, or is it > just for giving the kernel room to run? I don't think so. I don't see any logic that deals with gaps in the crashk region. > So while all of these are nice, what do you think about merging this as > an interm measure, especially for backporting to 2.6.28 stable (and any > distro that wants to pick up relocatable kdump)? I guess. I'd rather do something smarter, like I suggested above. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.infradead.org/pipermail/kexec/attachments/20090108/b49b42c4/attachment.bin