[CC Kees and Linus - for your background, we are talking about failures http://lkml.kernel.org/r/20180107090229.GB24862@xxxxxxxxxxxxxx introduced by http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@xxxxxxxxxx Debugging has shown that load_elf_binary tries to map elf segment over an existing brk - see below.] On Thu 01-02-18 08:43:34, Anshuman Khandual wrote: [...] > [ 9.295990] vma c000001fc8137c80 start 0000000010030000 end 0000000010040000 > next c000001fc81378c0 prev c000001fc8137680 mm c000001fc8108200 > prot 8000000000000104 anon_vma (null) vm_ops (null) > pgoff 1003 file (null) private_data (null) > flags: 0x100073(read|write|mayread|maywrite|mayexec|account) > [ 9.296351] CPU: 47 PID: 7537 Comm: sed Not tainted 4.14.0-00006-g4bd92fe-dirty #162 > [ 9.296450] Call Trace: > [ 9.296482] [c000001fc70db9b0] [c000000000b180e0] dump_stack+0xb0/0xf0 (unreliable) > [ 9.296588] [c000001fc70db9f0] [c0000000002db0b8] do_brk_flags+0x2d8/0x440 > [ 9.296674] [c000001fc70dbac0] [c0000000002db4d0] vm_brk_flags+0x80/0x130 > [ 9.296751] [c000001fc70dbb20] [c0000000003d2998] set_brk+0x80/0xe8 > [ 9.296824] [c000001fc70dbb60] [c0000000003d2518] load_elf_binary+0x12f8/0x1580 > [ 9.296910] [c000001fc70dbc80] [c00000000035d9e0] search_binary_handler+0xd0/0x270 > [ 9.296999] [c000001fc70dbd10] [c00000000035f938] do_execveat_common.isra.31+0x658/0x890 > [ 9.297089] [c000001fc70dbdf0] [c00000000035ff80] SyS_execve+0x40/0x50 > [ 9.297162] [c000001fc70dbe30] [c00000000000b220] system_call+0x58/0x6c > > But coming back to when it failed with MAP_FIXED_NOREPLACE, looking into ELF > section details (readelf -aW /usr/bin/sed), there was a PT_LOAD segment with > p_memsz > p_filesz which might be causing set_brk() to be called. > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > ... > LOAD 0x020328 0x0000000010030328 0x0000000010030328 0x000384 0x0094a0 RW 0x10000 > > which can be confirmed by just dumping elf_brk/elf_bss for this particular > instance. (elf_brk > elf_bss) Hmm, interesting. So the above is not a regular brk. The check has been added in 2001 by "v2.4.10.1 -> v2.4.10.2" but the changelog is not revealing at all. Btw. my /bin/ls also has MemSiz>FileSiz LOAD 0x01ade0 0x000000000061ade0 0x000000000061ade0 0x00079c 0x001520 RW 0x200000 113: 000000000061b57c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start and do not see any problem. So this is more likely a problem of elf_brk being placed at a wrong address. But I am desperately lost in this code so I might be completely off. > $dmesg | grep elf_brk > [ 9.571192] elf_brk 10030328 elf_bss 10030000 Hmm these are on the same page. Is this really expected? > static int load_elf_binary(struct linux_binprm *bprm) > --------------------- > > if (unlikely (elf_brk > elf_bss)) { > unsigned long nbyte; > > /* There was a PT_LOAD segment with p_memsz > p_filesz > before this one. Map anonymous pages, if needed, > and clear the area. */ > retval = set_brk(elf_bss + load_bias, > elf_brk + load_bias, > bss_prot); > > > --------------------- > So is not there a chance that subsequent file mapping might be overlapping > with these anon mappings ? I mean may be thats how ELF loading might be > happening right now. I will study the code more but it would be really great if somebody more familiar with this area could help me out a bit. Why do we add this brk at all and why it doesn't matter that we map over it by a real file mapping. As per previous email http://lkml.kernel.org/r/20180130094205.GS21609@xxxxxxxxxxxxxx there will be a new brk established later. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html