On 24/05/2017 17:45, Robin Murphy wrote: > On 24/05/17 16:26, Mason wrote: > >> Consider the following user-space code, split over two files >> to defeat the optimizer. >> >> This test program maps a page of memory not managed by Linux, >> and writes 4 words to misaligned addresses within that page. >> >> $ cat store.c >> void store_at_addr_plus_0(void *addr, int val) >> { >> __builtin_memcpy(addr + 0, &val, sizeof val); >> } >> void store_at_addr_plus_1(void *addr, int val) >> { >> __builtin_memcpy(addr + 1, &val, sizeof val); >> } >> >> $ cat testcase.c >> #include <fcntl.h> >> #include <sys/mman.h> >> #include <stdio.h> >> void store_at_addr_plus_0(void *addr, int val); >> void store_at_addr_plus_1(void *addr, int val); >> int main(void) >> { >> int fd = open("/dev/mem", O_RDWR | O_SYNC); >> void *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xc0000000); >> store_at_addr_plus_0(ptr + 0, fd); puts("X"); // store at ptr + 0 => OK >> store_at_addr_plus_0(ptr + 1, fd); puts("X"); // store at ptr + 1 => OK >> store_at_addr_plus_1(ptr + 3, fd); puts("X"); // store at ptr + 4 => OK >> store_at_addr_plus_1(ptr + 0, fd); puts("X"); // store at ptr + 1 => ABORT >> return 0; >> } >> >> With optimizations turned off, the program works as expected. >> >> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O0 testcase.c store.c -o misaligned_stores >> $ ./misaligned_stores >> X >> X >> X >> X >> >> But if optimizations are enabled, the program aborts on the last store. >> >> $ arm-linux-gnueabihf-gcc-6.3.1 -Wall -O1 testcase.c store.c -o misaligned_stores >> # ./misaligned_stores >> X >> X >> X >> Bus error >> [ 8736.457254] Alignment trap: not handling instruction f8c01001 at [<000104aa>] > ^^^ > > Note where that message comes from: The alignment fault fixup code > doesn't recognise this instruction encoding, so it doesn't get fixed up. > It's that simple. ARMv7 can handle misaligned accesses in hardware, right? But Linux sets up the MMU mapping to fault for misaligned accesses in "non-standard" areas, is that correct? I will study arch/arm/mm/alignment.c > Try "echo 5 > /proc/cpu/alignment" then run it again, and it should > become clearer what the kernel's doing (or not) behind your back - see > Documentation/arm/mem_alignment # echo 5 > /proc/cpu/alignment # ./misaligned_stores X Bus error [ 241.813350] Alignment trap: misaligned_stor (1015) PC=0x000104b8 Instr=0x6001 Address=0xb6f16001 FSR 0x811 > The other thing to say, of course, is "don't make unaligned accesses to > Strongly-Ordered memory in the first place". How would you fix my test case? Ard mentioned something similar on IRC: > doesn't the issue go away when you stop using device attributes for the userland mapping? > iiuc you are mapping memory from userland that is not mapped by the kernel, right? > which is why it gets pgprot_noncached() attributes > so if you do add this memory to memblock but with the MEMBLOCK_NOMAP attribute > and use O_SYNC to open /dev/mem from userland > you will get writecombine attributes instead > it is perfectly legal for gcc to generate unaligned accesses to something that is presented > to it as being memory so you should focus on getting the attributes correct on this region I will study the different properties (cached vs noncached, write-combined). >> [ 8736.464496] Unhandled fault: alignment exception (0x811) at 0xb6f4b001 >> [ 8736.471106] pgd = de2d4000 >> [ 8736.473839] [b6f4b001] *pgd=9f56b831, *pte=c0000743, *ppte=c0000c33 >> >> (gdb) disassemble store_at_addr_plus_0 >> 0x000104a6 <+0>: str r1, [r0, #0] >> 0x000104a8 <+2>: bx lr >> >> (gdb) disassemble store_at_addr_plus_1 >> 0x000104aa <+0>: str.w r1, [r0, #1] >> 0x000104ae <+4>: bx lr >> >> >> So the 4th store (a misaligned store) aborts. >> But why doesn't the 2nd store abort as well? >> It targets the *same* address. >> They're using different versions of the str instruction. >> >> The compiler generates >> str r1, [r0] @ unaligned >> str r1, [r0, #1] @ unaligned >> >> According to objdump >> >> 00000000 <store_at_addr_plus_0>: >> 0: 6001 str r1, [r0, #0] >> 2: 4770 bx lr >> >> 00000004 <store_at_addr_plus_1>: >> 4: f8c0 1001 str.w r1, [r0, #1] >> 8: 4770 bx lr >> >> Side issue, the T2 encoding for the STR instruction states >> 1 1 1 1 1 0 0 0 0 1 0 0 Rn >> which comes out as f840, not f8c0; I don't understand. Ard said: > btw the str.w encodings are listed as T3/T4 in my copy of the v8 ARM ARM I'm on a Cortex A9, so ARMv7-A But my copy of the ARM ARM is revB. I found rev C.b but that doesn't explain f8c0 vs f840 Regards.