On 2023-09-15 23:15:05+0100, Pedro Falcato wrote: > On Fri, Sep 15, 2023 at 4:54 AM Thomas Weißschuh <linux@xxxxxxxxxxxxxx> wrote: > > > > When allocating the pages for bss the start address needs to be rounded > > down instead of up. > > Otherwise the start of the bss segment may be unmapped. > > > > The was reported to happen on Aarch64: > > > > Memory allocated by set_brk(): > > Before: start=0x420000 end=0x420000 > > After: start=0x41f000 end=0x420000 > > > > The triggering binary looks like this: > > > > Elf file type is EXEC (Executable file) > > Entry point 0x400144 > > There are 4 program headers, starting at offset 64 > > > > Program Headers: > > Type Offset VirtAddr PhysAddr > > FileSiz MemSiz Flags Align > > LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 > > 0x0000000000000178 0x0000000000000178 R E 0x10000 > > LOAD 0x000000000000ffe8 0x000000000041ffe8 0x000000000041ffe8 > > 0x0000000000000000 0x0000000000000008 RW 0x10000 > > NOTE 0x0000000000000120 0x0000000000400120 0x0000000000400120 > > 0x0000000000000024 0x0000000000000024 R 0x4 > > GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 > > 0x0000000000000000 0x0000000000000000 RW 0x10 > > > > Section to Segment mapping: > > Segment Sections... > > 00 .note.gnu.build-id .text .eh_frame > > 01 .bss > > 02 .note.gnu.build-id > > 03 > > > > Reported-by: Sebastian Ott <sebott@xxxxxxxxxx> > > Closes: https://lore.kernel.org/lkml/5d49767a-fbdc-fbe7-5fb2-d99ece3168cb@xxxxxxxxxx/ > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Thomas Weißschuh <linux@xxxxxxxxxxxxxx> > > --- > > > > I'm not really familiar with the ELF loading process, so putting this > > out as RFC. > > > > A example binary compiled with aarch64-linux-gnu-gcc 13.2.0 is available > > at https://test.t-8ch.de/binfmt-bss-repro.bin > > --- > > fs/binfmt_elf.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > > index 7b3d2d491407..4008a57d388b 100644 > > --- a/fs/binfmt_elf.c > > +++ b/fs/binfmt_elf.c > > @@ -112,7 +112,7 @@ static struct linux_binfmt elf_format = { > > > > static int set_brk(unsigned long start, unsigned long end, int prot) > > { > > - start = ELF_PAGEALIGN(start); > > + start = ELF_PAGESTART(start); > > end = ELF_PAGEALIGN(end); > > if (end > start) { > > /* > > I don't see how this change can be correct. set_brk takes the start of > .bss as the start, so doing ELF_PAGESTART(start) will give you what > may very well be another ELF segment. In the common case, you'd map an > anonymous page on top of someone's .data, which will misload the ELF. That does make sense, and indeed it breaks more complex binaries. > The current logic looks OK to me (gosh this code would ideally take a > good refactoring...). I still can't quite tell how padzero() (in the > original report) is -EFAULTing though. As a test I replaced the asm clear_user() in padzero() with the generic memset()-based implementation from include/asm-generic/uaccess.h. It does provide better diagnostics, see below. Who should have mapped this partial .bss page if there is no .data? Maybe the logic needs to be a bit more complex and check if this page has been already mapped for .data and in that case don't map it again. [ 5.620235] Run /init as init process [ 5.662763] CUSTOM DEBUG ELF_PAGEALIGN(start)=0x420000 ELF_PAGEALIGN(end)=0x420000 ELF_PAGESTART(0x41f000) [ 5.667176] Unable to handle kernel paging request at virtual address 000000000041ffe8 [ 5.668062] Mem abort info: [ 5.668429] ESR = 0x0000000096000045 [ 5.669400] EC = 0x25: DABT (current EL), IL = 32 bits [ 5.670119] SET = 0, FnV = 0 [ 5.670608] EA = 0, S1PTW = 0 [ 5.671172] FSC = 0x05: level 1 translation fault [ 5.672024] Data abort info: [ 5.673273] ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000 [ 5.674169] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 5.674991] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 5.676871] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000043f20000 [ 5.677776] [000000000041ffe8] pgd=0800000043c62003, p4d=0800000043c62003, pud=0800000043c62003, pmd=0000000000000000 [ 5.681522] Internal error: Oops: 0000000096000045 [#1] PREEMPT SMP [ 5.682604] Modules linked in: [ 5.683576] CPU: 0 PID: 1 Comm: init Not tainted 6.6.0-rc1+ #241 00a261b9689606c4fc0c90eb29739c5b0eec7b82 [ 5.684706] Hardware name: linux,dummy-virt (DT) [ 5.685462] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5.686094] pc : __memset+0x50/0x188 [ 5.686572] lr : padzero+0x84/0xa0 [ 5.686956] sp : ffffffc08003bc70 [ 5.687307] x29: ffffffc08003bc70 x28: 0000000000000000 x27: ffffff80026afa00 [ 5.688091] x26: 000000000041fff0 x25: 000000000041ffe8 x24: 0000000000400144 [ 5.688698] x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000 [ 5.689275] x20: 0000000000000fe8 x19: 000000000041ffe8 x18: ffffffffffffffff [ 5.689928] x17: ffffffdf46cd2984 x16: ffffffdf46cd2880 x15: 0720072007200720 [ 5.690597] x14: 0720072007200720 x13: 0720072007200720 x12: 0000000000000000 [ 5.691192] x11: 00000000ffffefff x10: 0000000000000000 x9 : ffffffdf46e1ba28 [ 5.691906] x8 : 000000000041ffe8 x7 : 0000000000000000 x6 : 0000000000057fa8 [ 5.692496] x5 : 0000000000000fff x4 : 0000000000000008 x3 : 0000000000000000 [ 5.693168] x2 : 0000000000000018 x1 : 0000000000000000 x0 : 000000000041ffe8 [ 5.693985] Call trace: [ 5.694318] __memset+0x50/0x188 [ 5.694708] load_elf_binary+0x630/0x15d0 [ 5.695132] bprm_execve+0x2bc/0x7c0 [ 5.695505] kernel_execve+0x144/0x1c8 [ 5.695882] run_init_process+0xf8/0x110 [ 5.696264] kernel_init+0x8c/0x200 [ 5.696624] ret_from_fork+0x10/0x20 [ 5.697216] Code: d65f03c0 cb0803e4 f2400c84 54000080 (a9001d07) [ 5.698936] ---[ end trace 0000000000000000 ]--- [ 5.701625] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 5.702502] SMP: stopping secondary CPUs [ 5.703608] Kernel Offset: 0x1ec6a00000 from 0xffffffc080000000 [ 5.704119] PHYS_OFFSET: 0x40000000 [ 5.704491] CPU features: 0x0000000d,00020000,0000420b [ 5.705276] Memory Limit: none