On Fri, Jul 18, 2008 at 1:58 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote: > On Fri, Jul 18, 2008 at 1:20 PM, Josef Bacik <jbacik@xxxxxxxxxx> wrote: >>> You can see the full log at >>> http://folk.uio.no/vegardno/linux/log-1216380709.txt which shows that >>> it already survived a lot of failures, so I'm guessing your patch was >>> correct and we just hit a different case. What do you think? >>> >> >> Yeah you are right, its like a shitty game of wack-a-mole. Heres another patch, >> same thing as last time, pull the other one out put this one on. Thanks, > > It seems to hold up -- no stacktraces, but lots of IO failures. > > I would leave it in testing for a bit more, but I've got to run; I'll > give it another go when I get home. Ok, we still got this: BUG: unable to handle kernel NULL pointer dereference at 0000000c IP: [<c025ea28>] journal_dirty_metadata+0xb8/0x1b0 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 4770, comm: rm Not tainted (2.6.26-03421-g253a722 #49) EIP: 0060:[<c025ea28>] EFLAGS: 00210246 CPU: 1 EIP is at journal_dirty_metadata+0xb8/0x1b0 EAX: 00000000 EBX: f3d70c90 ECX: 00000001 EDX: f3e12000 ESI: 00000000 EDI: f21118f0 EBP: f3e13d94 ESP: f3e13d6c DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rm (pid: 4770, ti=f3e12000 task=f62cdfa0 task.ti=f3e12000) Stack: f3d70430 f578047c f578047c f3e13d94 c0222cdb f779c000 f6ff2e70 f21118f0 f779c000 f21118f0 f3e13db4 c02345ef 0000001c 00001499 c0760bc4 f21118f0 00000000 ef36d004 f3e13de4 c0228e6f 0000147e 0000001c ef36d004 ef36d400 Call Trace: [<c0222cdb>] ? ext3_free_blocks+0x6b/0xa0 [<c02345ef>] ? __ext3_journal_dirty_metadata+0x1f/0x50 [<c0228e6f>] ? ext3_free_data+0x9f/0x100 [<c02290e3>] ? ext3_free_branches+0x213/0x220 [<c0222cdb>] ? ext3_free_blocks+0x6b/0xa0 [<c0228f7e>] ? ext3_free_branches+0xae/0x220 [<c022967c>] ? ext3_truncate+0x58c/0x940 [<c015ad96>] ? trace_hardirqs_on_caller+0x116/0x170 [<c0260733>] ? journal_start+0xd3/0x110 [<c0260710>] ? journal_start+0xb0/0x110 [<c0229b07>] ? ext3_delete_inode+0xd7/0xe0 [<c0229a30>] ? ext3_delete_inode+0x0/0xe0 [<c01b9bc1>] ? generic_delete_inode+0x81/0x120 [<c01b9d87>] ? generic_drop_inode+0x127/0x180 [<c01b8c07>] ? iput+0x47/0x50 [<c01af1dc>] ? do_unlinkat+0xec/0x170 [<c01b187b>] ? vfs_readdir+0x6b/0xa0 [<c01b1560>] ? filldir64+0x0/0xf0 [<c0430a08>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015ad96>] ? trace_hardirqs_on_caller+0x116/0x170 [<c01af3a3>] ? sys_unlinkat+0x23/0x50 [<c010407f>] ? sysenter_past_esp+0x78/0xc5 ======================= Code: b8 01 00 00 00 e8 c9 3f ed ff 89 e0 25 00 e0 ff ff f6 40 08 08 74 05 e8 47 98 4e 00 83 c4 1c 31 c0 5b 5e 5f 5d c3 90 8d 74 26 00 <8b> 46 0c 85 c0 0f 84 8d 00 00 00 8b 45 f0 39 46 18 74 66 8d 47 EIP: [<c025ea28>] journal_dirty_metadata+0xb8/0x1b0 SS:ESP 0068:f3e13d6c Kernel panic - not syncing: Fatal exception It looks similar to one of the others we saw. Are you sure I should back out all your previous patches? My stack looks like this: Duane Griffin (1): ext3: validate directory entry Josef Bacik (1): ext3 on latest -git: BUG: unable to handle kernel NULL pointer dereference And I am using error=continue. Now I've modified my scripts to also save the bad image, so I (or whomever) can re-test a specific crash easily. For instance, this one can be downloaded from http://folk.uio.no/vegardno/linux/ext3-crash-fs.bin.bz2 and mounted. Then you run rm -rf mnt/* and it should crash. Log is also available at http://folk.uio.no/vegardno/linux/log-1216412153.txt Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html