Hello Vivek, Ping! Cheers, Michael On 16 January 2015 at 14:30, Michael Kerrisk (man-pages) <mtk.manpages at gmail.com> wrote: > Hello Vivek, > > Thanks for your comments! I've added some further text to > the page based on those comments. See some follow-up > questions below. > > On 01/12/2015 11:16 PM, Vivek Goyal wrote: >> On Wed, Jan 07, 2015 at 10:17:56PM +0100, Michael Kerrisk (man-pages) wrote: >> >> [..] >>>>> .BR KEXEC_ON_CRASH " (since Linux 2.6.13)" >>>>> Execute the new kernel automatically on a system crash. >>>>> .\" FIXME Explain in more detail how KEXEC_ON_CRASH is actually used >>> >>> I wasn't expecting that you would respond to the FIXMEs that were >>> not labeled "kexec_file_load", but I was hoping you might ;-). Thanks! >>> I have a few additional questions to your nice notes. >>> >>>> Upon boot first kernel reserves a chunk of contiguous memory (if >>>> crashkernel=<> command line paramter is passed). This memory is >>>> is used to load the crash kernel (Kernel which will be booted into >>>> if first kernel crashes). >>> >> >> Hi Michael, >> >>> Can I just confirm: is it in all cases only possible to use kexec_load() >>> and kexec_file_load() if the kernel was booted with the 'crashkernel' >>> parameter set? >> >> As of now, only kexec_load() and kexec_file_load() system calls can >> make use of memory reserved by crashkernel=<> kernel parameter. And >> this is used only if we are trying to load a crash kernel (KEXEC_ON_CRASH >> flag specified). > > Okay. > >>>> Location of this reserved memory is exported to user space through >>>> /proc/iomem file. >>> >>> Is that export via an entry labeled "Crash kernel" in the >>> /proc/iomem file? >> >> Yes. > > Okay -- thanks. > >>>> User space can parse it and prepare list of segments >>>> specifying this reserved memory as destination. >>> >>> I'm not quite clear on "specifying this reserved memory as destination". >>> Is that done by specifying the address in the kexec_segment.mem fields? >> >> You are absolutely right. User space can specify in kexec_segment.mem >> field the memory location where it expecting a particular segment to >> be loaded by kernel. >> >>> >>>> Once kernel sees the flag KEXEC_ON_CRASH, it makes sure that all the >>>> segments are destined for reserved memory otherwise kernel load operation >>>> fails. >>> >>> Could you point me to where this checking is done? Also, what is the >>> error (errno) that occurs when the load operation fails? (I think the >>> answers to these questions are "at the start of kimage_alloc_init()" >>> and "EADDRNOTAVAIL", but I'd like to confirm.) >> >> This checking happens in sanity_check_segment_list() which is called >> by kimage_alloc_init(). >> >> And yes, error code returned is -EADDRNOTAVAIL. > > Thanks. I added EADDRNOTAVAIL to the ERRORS. > >>>> [..] >>>>> struct kexec_segment { >>>>> void *buf; /* Buffer in user space */ >>>>> size_t bufsz; /* Buffer length in user space */ >>>>> void *mem; /* Physical address of kernel */ >>>>> size_t memsz; /* Physical address length */ >>>>> }; >>>>> .fi >>>>> .in >>>>> .PP >>>>> .\" FIXME Explain the details of how the kernel image defined by segments >>>>> .\" is copied from the calling process into previously reserved memory. >>>> >>>> Kernel image defined by segments is copied into kernel either in regular >>>> memory >>> >>> Could you clarify what you mean by "regular memory"? >> >> I meant memory which is not reserved memory. > > Okay. > >>>> or in reserved memory (if KEXEC_ON_CRASH is set). Kernel first >>>> copies list of segments in kernel memory and then goes does various >>>> sanity checks on the segments. If everything looks line, kernel copies >>>> segment data to kernel memory. >>>> >>>> In case of normal kexec, segment data is loaded in any available memory >>>> and segment data is moved to final destination at the kexec reboot time. >>> >>> By "moved to final destination", do you mean "moved from user space to the >>> final kernel-space destination"? >> >> No. Segment data moves from user space to kernel space once kexec_load() >> call finishes successfully. But when user does reboot (kexec -e), at that >> time kernel moves that segment data to its final location. Kernel could >> not place the segment at its final location during kexec_load() time as >> that memory is already in use by running kernel. But once we are about >> to reboot to new kernel, we can overwrite the old kernel's memory. > > Got it. > >>>> In case of kexec on panic (KEXEC_ON_CRASH flag set), segment data is >>>> directly loaded to reserved memory and after crash kexec simply jumps >>> >>> By "directly", I assume you mean "at the time of the kexec_laod() call", >>> right? >> >> Yes. > > Thanks. > > So, returning to the kexeec_segment structure: > > struct kexec_segment { > void *buf; /* Buffer in user space */ > size_t bufsz; /* Buffer length in user space */ > void *mem; /* Physical address of kernel */ > size_t memsz; /* Physical address length */ > }; > > Are the following statements correct: > * buf + bufsz identify a memory region in the caller's virtual > address space that is the source of the copy > * mem + memsz specify the target memory region of the copy > * mem is physical memory address, as seen from kernel space > * the number of bytes copied from userspace is min(bufsz, memsz) > * if bufsz > memsz, then excess bytes in the user-space buffer > are ignored. > * if memsz > bufsz, then excess bytes in the target kernel buffer > are filled with zeros. > ? > > Also, it seems to me that 'mem' need not be page aligned. > Is that correct? Should the man page say something about that? > (E.g., is it generally desirable that 'mem' should be page aligned?) > > Likewise, 'memsz' doesn't need to be a page multiple, IIUC. > Should the man page say anything about this? For example, should > it note that the initialized kernel segment will be of size: > > (mem % PAGE_SIZE + memsz) rounded up to the next multiple of PAGE_SIZE > > And should it note that if 'mem' is not a multiple of the page size, then > the initial bytes (mem % PAGE_SIZE)) in the first page of the kernel segment > will be zeros? > > (Hopefully I have read kimage_load_normal_segment() correctly.) > > And one further question. Other than the fact that they are used with > different system calls, what is the difference between KEXEC_ON_CRASH > and KEXEC_FILE_ON_CRASH? > > Thanks, > > Michael > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/