On Tuesday February 5, andre.s@xxxxxxxxx wrote: > Feb 5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at virtual address 001cd901 This looks like some sort of memory corruption. > Feb 5 11:56:12 raid01 kernel: EIP is at md_do_sync+0x629/0xa32 This tells us what code is executing. > Feb 5 11:56:12 raid01 kernel: Code: 54 24 48 0f 87 a4 01 00 00 72 0a 3b 44 24 44 0f 87 98 01 00 00 3b 7c 24 40 75 0a 3b 74 24 3c 0f 84 88 01 00 00 0b 85 30 01 00 00 <88> 08 0f 85 90 01 00 00 8b 85 30 01 00 00 a8 04 0f 85 82 01 00 This tells us what the actual byte of code were. If I feed this line (from "Code:" onwards) into "ksymoops" I get 0: 54 push %esp 1: 24 48 and $0x48,%al 3: 0f 87 a4 01 00 00 ja 1ad <_EIP+0x1ad> 9: 72 0a jb 15 <_EIP+0x15> b: 3b 44 24 44 cmp 0x44(%esp),%eax f: 0f 87 98 01 00 00 ja 1ad <_EIP+0x1ad> 15: 3b 7c 24 40 cmp 0x40(%esp),%edi 19: 75 0a jne 25 <_EIP+0x25> 1b: 3b 74 24 3c cmp 0x3c(%esp),%esi 1f: 0f 84 88 01 00 00 je 1ad <_EIP+0x1ad> 25: 0b 85 30 01 00 00 or 0x130(%ebp),%eax Code; 00000000 Before first symbol 2b: 88 08 mov %cl,(%eax) 2d: 0f 85 90 01 00 00 jne 1c3 <_EIP+0x1c3> 33: 8b 85 30 01 00 00 mov 0x130(%ebp),%eax 39: a8 04 test $0x4,%al 3b: 0f .byte 0xf 3c: 85 .byte 0x85 3d: 82 (bad) 3e: 01 00 add %eax,(%eax) I removed the "Code;..." lines as they are just noise, except for the one that points to the current instruction in the middle. Note that it is dereferencing %eax, after just 'or'ing some value into it, which is rather unusual. Now get the "md-mod.ko" for the kernel you are running. run gdb md-mod.ko and give the command disassemble md_do_sync and look for code at offset 0x629, which is 1577 in decimal. I found a similar kernel to what you are running, and the matching code is 0x000055c0 <md_do_sync+1485>: cmp 0x30(%esp),%eax 0x000055c4 <md_do_sync+1489>: ja 0x5749 <md_do_sync+1878> 0x000055ca <md_do_sync+1495>: cmp 0x2c(%esp),%edi 0x000055ce <md_do_sync+1499>: jne 0x55da <md_do_sync+1511> 0x000055d0 <md_do_sync+1501>: cmp 0x28(%esp),%esi 0x000055d4 <md_do_sync+1505>: je 0x5749 <md_do_sync+1878> 0x000055da <md_do_sync+1511>: mov 0x130(%ebp),%eax 0x000055e0 <md_do_sync+1517>: test $0x8,%al 0x000055e2 <md_do_sync+1519>: jne 0x575f <md_do_sync+1900> 0x000055e8 <md_do_sync+1525>: mov 0x130(%ebp),%eax 0x000055ee <md_do_sync+1531>: test $0x4,%al 0x000055f0 <md_do_sync+1533>: jne 0x575f <md_do_sync+1900> 0x000055f6 <md_do_sync+1539>: mov 0x38(%esp),%ecx 0x000055fa <md_do_sync+1543>: mov 0x0,%eax - Note the sequence "cmp, ja, cmp, jne, cmp, je" where the "cmp" arguments are consecutive 4byte values on the stack (%esp). In the code from your oops, the offsets are 0x44 0x40 0x3c. In the kernel I found they are 0x30 0x2c 0x28. The difference is some subtle difference in the kernel, possibly a different compiler or something. Anyway, your code crashed at 25: 0b 85 30 01 00 00 or 0x130(%ebp),%eax Code; 00000000 Before first symbol 2b: 88 08 mov %cl,(%eax) The matching code in the kernel I found is 0x000055da <md_do_sync+1511>: mov 0x130(%ebp),%eax 0x000055e0 <md_do_sync+1517>: test $0x8,%al Note that you have an 'or', the kernel I found has 'mov'. If we look at the actual byte of code for those two instructions the code that crashed shows the bytes above: 0b 85 30 01 00 00 88 08 if I get the same bytes with gdb: (gdb) x/8b 0x000055da 0x55da <md_do_sync+1511>: 0x8b 0x85 0x30 0x01 0x00 0x00 0xa8 0x08 (gdb) So what should be "8b" has become "0b", and what should be "a8" has become "08". If you look for the same data in your md-mod.ko, you might find slightly different details but it is clear to me that the code in memory is bad. Possible you have bad memory, or a bad CPU, or you are overclocking the CPU, or it is getting hot, or something. But you clearly have a hardware error. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html