On Mon, 2007-10-01 at 14:10 +0530, Vivek Goyal wrote: On Wed, Sep 26, 2007 at 03:34:10PM +0800, Huang, Ying wrote: > > Hi, > > > > I have a proposal to do crashdump without reserving memory during system > > boot. The method is as follow: > > > > 1. Do not reserve memory during system boot, that is > > crashkernel=<XX>@<YY> is not used in kernel command line. > > > > 2. A new kexec flag named KEXEC_CRASH_BY_NORMAL is defined for > > sys_kexec_load system call. When this flag is specified, the > > sys_kexec_load works as normal kexec (not crash kexec), except the > > destination image is kexec_crash_image instead of kexec_image. > > > > 3. In kexec-tools (/sbin/kexec), --mem-min=<addr1> and --mem-max=<addr2> > > is used to specify the memory area used by crashdump kernel. That is, > > the image, elf core header, available memory of crashdump kernel is > > within <addr1> ~ <addr2>. > > > > Probably this can be an optional thing. Anyway if destination pages are > going to be backed up in source pages, a user does not have to specify > --mem-min and --mem-max. > The --mem-min and --mem-max is used to specify the destination memory range. I think they are necessary. One source page corresponds to one destination page (except some source page allocated at the same position of corresponding destination page). The --mem-min and --mem-max has similar function as crashkernel=YM at XM in kernel parameters. > 4. In kexec-tools, in addition to kernel image, elf core header, etc are > > loaded, the available memory of crashdump kernel is loaded too. For > > example, the segments for sys_kexec_load for crashdump kernel can be: > > > > --mem-min=0x100000 > > --mem-max=0xffffff > > > > No. buf bufsz mem memsz > > 0 NULL 0 0x1000 0x9e000 > > 1 0x881fe88 0x289b 0x100000 0x3000 > > 2 NULL 0 0x103000 0xfd000 > > 3 0xb7bfa808 0xb7c00 0x200000 0xb8000 > > 4 NULL 0 0x2b8000 0xd39000 > > 5 0x8818d38 0x7120 0xff1000 0x9000 > > 6 NULL 0 0xffa000 0x1000 > > 7 0x8818268 0x400 0xffb000 0x4000 > > 8 NULL 0 0xfff000 0x1000 > > > > May be user also need to specify how much memory to allocate for second > kernel execution. > The memory for second kernel execution is specified through --mem-min and --mem-max. > 5. In relocate_kernel of Linux kernel, instead of copy the source page > > to destination page, the contents of source page and the destination > > page are swapped. (The destination page -> source page map is in > > kexec_crash_image->head) The memory area used by crashdump kernel is > > backupped to source page. > > > > > > Interesting. Just that it introduces more code in crash path. > > The source/destination page swap code is very simple and executed after turning off paging. So I think the added code has no big problem. > In original crashdump implementation, the crashdump kernel run in > > reserved memory area. The reserved memory pages are reserved memory > > pages in primary (original) kernel. > > > > In this proposed implementation, the crashdump kernel run in specified > > memory area, the contents of destination memory area is backupped before > > crashdump kernel running. The backup pages are allocated memory pages in > > primary (original) kernel. > > > > How would you prepare ELF headers for backed up memory. ELF headers are > created in user space and before sys_kexec_load is executed, kexec-tools > need to know the address of physical memory where the actual data is. But > in this scheme, source pages will be allocated only after sys_kexec_load > has been called. > > These source page addresses will have to be exported to user space so > that kexec tools can fill up ELF headers accordingly. > Now, the memory region used by the second kernel is excluded from the ELF headers. The map of destination page -> source page can be passed to the second kernel. So the contents of destination page can be restored from source page in a user space tool (such as a modified version of makedumpfile). It is much harder to embed the map of destination page -> source into ELF headers. > > > The pros and cons of proposed implementation: > > > > Pros: > > - The memory used by crashdump kernel need not to be reserved during > > boot time. > > - The memory used by crashdump kernel can be specified during > > sys_kexec_load > > - The memory used by crashdump kernel can be freed after unloading. > > > > Cons: > > - The memory used by crashdump kernel can be the DMA destination, their > > contents may be ruined by devices during the boot of crashdump kernel. > > (Is it possible to turn off DMA for some memory area other than > > reserving it?) > > Potential corruption because of DMA was a big issue and that's why the > exclusive reserved area and relocatable kernel came into the picture. > > Eric in the past had tried disabling DMA at PCI level, but I think it > did not work for him. > > - There is no gurantee that one will get sufficient memory allocated > when needed. so loading kdump kernel might fail. > > - More code in crash path and potentially reduces the relibaility of > the mechanism. A possible solution for DMA issue is as follow: - Specify the memory region used by the second kernel in kernel boot command line. - Create a zone for this memory region. This zone can not be used for DMA. - Use this memory region for the second kernel. > > > > > > In fact, almost all mechanism for this proposal has been implemented by > > my previous patch: "kexec jump" in "kexec based hibernation". > > > > > > Any comment is welcome. > > > > Idea is interesting. But at the same time it reduces the reliability of > kdump. I am especially concerned about DMA issue more code in crash path. It is less reliable than the original method. But I think if the DMA issue can be solved, it may be acceptable. > I will rather try to find out if I can create some mechanisms to do large > contiguous memory area allocation from user space at run time instead of > doing it at boot time. Best Regards, Huang Ying