On Fri, May 6, 2016 at 9:08 AM, Borislav Petkov <bp@xxxxxxxxx> wrote: > On Fri, May 06, 2016 at 12:46:05AM -0700, tip-bot for Yinghai Lu wrote: >> Commit-ID: 9dc1969c24eff8b7d7a9a565d1047b624921ba06 >> Gitweb: http://git.kernel.org/tip/9dc1969c24eff8b7d7a9a565d1047b624921ba06 >> Author: Yinghai Lu <yinghai@xxxxxxxxxx> >> AuthorDate: Thu, 5 May 2016 15:13:47 -0700 >> Committer: Ingo Molnar <mingo@xxxxxxxxxx> >> CommitDate: Fri, 6 May 2016 09:00:59 +0200 >> >> x86/KASLR: Consolidate mem_avoid[] entries >> >> The mem_avoid[] array is used to track positions that should be avoided (like >> the compressed kernel, decompression code, etc) when selecting a memory >> position for the randomly relocated kernel. Since ZO is now at the end of > > Can we stop with those ZO(MG) obfuscated abbreviations and call it > simply the "compressed kernel image"? I can expand them in the change logs, but it helps to keep reinforcing their names since all the variables are named using these. >> the decompression buffer and the decompression code (and its heap and >> stack) are at the front, we can safely consolidate the decompression entry, >> the heap entry, and the stack entry. The boot_params memory, however, could >> be elsewhere, so it should be explicitly included. >> >> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> >> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx> >> [ Rwrote changelog, cleaned up code comments. ] >> Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxx> >> Cc: Borislav Petkov <bp@xxxxxxxxx> >> Cc: Brian Gerst <brgerst@xxxxxxxxx> >> Cc: Dave Young <dyoung@xxxxxxxxxx> >> Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx> >> Cc: H. Peter Anvin <hpa@xxxxxxxxx> >> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Vivek Goyal <vgoyal@xxxxxxxxxx> >> Cc: kernel-hardening@xxxxxxxxxxxxxxxxxx >> Cc: lasse.collin@xxxxxxxxxxx >> Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@xxxxxxxxxxxx >> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> >> --- >> arch/x86/boot/compressed/kaslr.c | 77 +++++++++++++++++++++++++++++++--------- >> 1 file changed, 61 insertions(+), 16 deletions(-) >> >> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c >> index 2072d82..6392f00 100644 >> --- a/arch/x86/boot/compressed/kaslr.c >> +++ b/arch/x86/boot/compressed/kaslr.c >> @@ -121,7 +121,7 @@ struct mem_vector { >> unsigned long size; >> }; >> >> -#define MEM_AVOID_MAX 5 >> +#define MEM_AVOID_MAX 4 >> static struct mem_vector mem_avoid[MEM_AVOID_MAX]; >> >> static bool mem_contains(struct mem_vector *region, struct mem_vector *item) >> @@ -146,22 +146,71 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two) >> return true; >> } >> >> +/* >> + * In theroy, KASLR can put the kernel anywhere in area of [16M, 64T). The > ^^^^^^ > Spellchecker please. Gah, thanks. No matter how many times I run through these, I keep adding more typos. :) > > in the range of [... > >> + * mem_avoid array is used to store the ranges that need to be avoided when >> + * KASLR searches for a an appropriate random address. We must avoid any > > s/a // > >> + * regions that are unsafe to overlap with during decompression, and other >> + * things like the initrd, cmdline and boot_params. >> + * >> + * How to calculate the unsafe areas is detailed here, and is informed by >> + * the decompression calculations in header.S, and the diagram in misc.c. >> + * >> + * The compressed vmlinux (ZO) plus relocs and the run space of ZO can't be >> + * overwritten by decompression output. >> + * >> + * ZO sits against the end of the decompression buffer, so we can calculate >> + * where text, data, bss, etc of ZO are positioned. >> + * >> + * The follow are already enforced by the code: > > " ... following conditions are ..." > >> + * - init_size >= kernel_total_size >> + * - input + input_len >= output + output_len >> + * - kernel_total_size could be >= or < output_len >> + * >> + * From this, we can make several observations, illustrated by a diagram: >> + * - init_size >= kernel_total_size > > You just said this above?! > >> + * - input + input_len > output + output_len >> + * - kernel_total_size >= output_len > > Ok, what is the point of the observations? To show that we're making the > conditions enforced by the code more concrete, stricter, ...? This was an earlier attempt by Baoquan to fully explain the reasoning in this code since I couldn't understand it. He added the specific conditions, observations, and added the diagram. The goal is to prove that the changes to mem_avoid are safe since mistakes here lead to really hard to debug bugs. >> + * >> + * 0 output input input+input_len output+init_size >> + * | | | | | >> + * | | | | | >> + * |-----|--------|--------|------------------|----|------------|----------| >> + * | | | >> + * | | | >> + * output+init_size-ZO_INIT_SIZE output+output_len output+kernel_total_size >> + * >> + * [output, output+init_size) is for the buffer for decompressing the >> + * compressed kernel (ZO). >> + * >> + * [output, output+kernel_total_size) is for the uncompressed kernel (VO) >> + * and its bss, brk, etc. >> + * [output, output+output_len) is VO plus relocs >> + * >> + * [output+init_size-ZO_INIT_SIZE, output+init_size) is the copied ZO. > > ..., i.e., ZO_INIT_SIZE. Well, no, these are ranges, so literally what it says. "output+init_size-ZO_INIT_SIZE" is the start of the compressed image (ZO). It's position is now found from the end of the buffer, which is output+init_size (VO's position plus VO's total run size) minus the total run size of ZO. > >> + * [input, input+input_len) is the copied compressed (VO (vmlinux after >> + * objcopy) plus relocs), not the ZO. >> + * >> + * [input+input_len, output+init_size) is [_text, _end) for ZO. That was the >> + * first range in mem_avoid, which included ZO's heap and stack. Also >> + * [input, input+input_size) need be put in mem_avoid array, but since it > > " ... needs to be in the mem_avoid ..." > >> + * is adjacent to the first entry, they can be merged. This is how we get >> + * the first entry in mem_avoid[]. >> + */ > > <--- my head smokes right about here. :-) Heh. Yeah, and this is LESS confusing than when the ZO wasn't aligned to the end of the buffer. A whole other set of conditions vanish now. I will try to further explain these. >> static void mem_avoid_init(unsigned long input, unsigned long input_size, >> - unsigned long output, unsigned long output_size) >> + unsigned long output) >> { >> + unsigned long init_size = boot_params->hdr.init_size; >> u64 initrd_start, initrd_size; >> u64 cmd_line, cmd_line_size; >> - unsigned long unsafe, unsafe_len; >> char *ptr; >> >> /* >> * Avoid the region that is unsafe to overlap during >> - * decompression (see calculations in ../header.S). >> + * decompression. >> */ >> - unsafe_len = (output_size >> 12) + 32768 + 18; >> - unsafe = (unsigned long)input + input_size - unsafe_len; >> - mem_avoid[0].start = unsafe; >> - mem_avoid[0].size = unsafe_len; >> + mem_avoid[0].start = input; >> + mem_avoid[0].size = (output + init_size) - input; > > There's a define above called MEM_AVOID_MAX but the array elements are > adressed with naked numbers here. Perhaps if we had defines for them > too, it'll make the code a bit more understandable: > > mem_avoid[MEM_AVOID_ZO_RANGE] ... > mem_avoid[MEM_AVOID_INITRD] ... > > and so on. Ah! Yes, excellent. I'll actually use an enum so I can get MEM_AVOID_MAX automatically. -Kees -- Kees Cook Chrome OS & Brillo Security -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |