On Tue 25-06-13 19:06:53, Akira Fujita wrote: > > > (2013/06/24 20:04), Jan Kara wrote: > > On Tue 28-05-13 17:36:09, Akira Fujita wrote: > >> Hello, > >> > >> (2013/05/27 17:33), Jan Kara wrote: > >>> On Mon 27-05-13 12:16:01, Dmitry Monakhov wrote: > >>>> On Mon, 27 May 2013 10:49:11 +0900, Akira Fujita <a-fujita@xxxxxxxxxxxxx> wrote: > >>>>> Hi Jan, > >>>>> > >>>>> (2013/05/15 11:57), Akira Fujita wrote: > >>>>>> Hi, > >>>>>> > >>>>>> (2013/05/15 6:37), Jan Kara wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> On Mon 13-05-13 15:49:24, Akira Fujita wrote: > >>>>>>>> I ran into the deaclock with xfs_test 269 on linux-3.9.0. > >>>>>>>> It seems happen between jbd2_log_wait_commit, sleep_on_buffer > >>>>>>>> and writeback_indoes (Please see ps log below). > >>>>>>>> Once it occurs we can't touch FS anymore. > >>>>>>>> In my case 300 - 1000 trials to occur. Is this known issue? > >>>>>>>> > >>>>>>>> The following kernels seems to have same problem: > >>>>>>>> - linux-3.5-rc5 > >>>>>>>> - linux-3.8.5 > >>>>>>>> - linux-3.9-rc7 > >>>>>>>> And now I'm trying it on linux-3.10-rc1. > >>>>>>>> > >>>>>>>> # ./check generic/269 > >>>>>>>> FSTYP -- ext4 > >>>>>>>> PLATFORM -- Linux/x86_64 mcds1 3.9.0 > >>>>>>>> MKFS_OPTIONS -- /dev/sda12 > >>>>>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sda12 /mnt/mp2 > >>>>>>>> > >>>>>>>> > >>>>>>>> # ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:16,comm > >>>>>>>> PID TID CLS RTPRIO NI PRI PSR %CPU STAT WCHAN COMMAND > >>>>>>>> 1 1 TS - 0 19 0 0.0 Ss poll_schedule_ti init > >>>>>>>> 2 2 TS - 0 19 0 0.0 S kthreadd kthreadd > >>>>>>>> 3 3 TS - 0 19 0 0.0 S smpboot_thread_f ksoftirqd/0 > >>>>>>>> ... > >>>>>>>> 2391 2391 TS - 0 19 2 0.1 D jbd2_log_wait_co flush-8:0 > >>>>>>>> ... > >>>>>>>> 22647 22647 TS - 0 19 3 0.0 S worker_thread kworker/3:1 > >>>>>>>> 22655 22655 TS - 0 19 0 0.0 S hrtimer_nanoslee sleep > >>>>>>>> 22657 22657 TS - 0 19 2 0.0 R+ - ps > >>>>>>>> 25330 25330 TS - 0 19 0 0.0 S worker_thread kworker/0:0 > >>>>>>>> 28963 28963 TS - 0 19 1 0.0 S+ wait loop_xfstests.s > >>>>>>>> 28964 28964 TS - 0 19 1 0.0 S+ wait check > >>>>>>>> 29180 29180 TS - 0 19 3 0.0 S kjournald2 jbd2/sda11-8 > >>>>>>>> 29181 29181 TS - -20 39 3 0.0 S< rescuer_thread ext4-dio-unwrit > >>>>>>>> 29199 29199 TS - 0 19 3 0.0 S+ wait 269 > >>>>>>>> 29391 29391 TS - 0 19 0 0.6 D sleep_on_buffer jbd2/sda12-8 > >>>>>>>> 29392 29392 TS - -20 39 3 0.0 S< rescuer_thread ext4-dio-unwrit > >>>>>>>> 29394 29394 TS - 0 19 0 0.0 S wait fsstress > >>>>>>>> 29505 29505 TS - 0 19 3 0.0 D writeback_inodes fsstress > >>>>>>>> > >>>>>>>> # df -T /dev/sda11 /dev/sda12 > >>>>>>>> Filesystem Type 1K-blocks Used Available Use% Mounted on > >>>>>>>> /dev/sda11 ext4 9857264 22308 9327564 1% /mnt/mp1 > >>>>>>>> /dev/sda12 ext4 499656 499656 0 100% /mnt/mp2 > >>>>>>> Thanks for report. No I don't think this problem has been reported > >>>>>>> before. Seeing that sda12 is out of space and fsstress hangs in > >>>>>>> writeback_inodes(), I suspect we have some deadlock in ENOSPC recovery path > >>>>>>> when we want to flush data to disk to reduce delalloc uncertainty. Can you > >>>>>>> run 'echo w >/proc/sysrq-trigger' when the deadlock happens and post your > >>>>>>> dmesg here? Thanks! > >>>>>>> > >>>>>> > >>>>>> Thanks for reply. > >>>>>> I'll take that information when the deadlock happens again. > >>>>> > >>>>> Finally, I got the following messages on linux-3.9.0. > >>>>> Could you take a look? > >>>> Is it a real deadlock? > >>>> umount wait for flush-8:0 > >>>> flush wait for jbd2 > >>>> jbd2 wait for bh. > >>>> All buffer-heads will be written at sooner or later. > >>>> It probably will require significant amount of time, because fs is too > >>>> fragmented(that is what this test all about), but it is not a deadlock. > >>> Yeah, that is my analysis as well. I was only thinking whether we > >>> couldn't forget some buffer locked on some error recovery path (we are > >>> getting ENOSPC) but it's not obvious where it would be. Akira-san, could > >>> you also send us disassembly of your jbd2_journal_commit_transaction() > >>> function so that we can check where exactly offset 0xee0 is in the > >>> function? > >>> > >>> Honza > >> > >> I attached disasemble code of jbd2_journal_commit_tnrasaction, > >> please check it. > > Thanks. I've looked into the disassembly now and we are waiting for the > > write of commit record to finish. We have submitted the buffer just before > > so it can hardly be some issue like forgotten locked buffer. Really > > strange. It looks like some problem in block layer, device driver, or HW > > itself... I'm thinking how to debug this further - it would be good if we > > had blktrace data just before the moment when the hang happens but given > > how hard this is to reproduce the traces would be extremly large... > > > > One idea - can you try whether the problem reproduces with 'nobarrier' > > mount option? It could be some problem with flush / fua handling. > > > > OK, I'll try to reproduce this issue with 'nobarrier' > (and take blktrace log if I can). You can restart the blktrace every 5 minutes or so, so that trace files don't get too big. If we have a trace for a minute before the machine crashes it should be enough for debugging. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html