Re: [PATCH] writeback: Write dirty times for WB_SYNC_ALL writeback

Jan Kara <jack@xxxxxxx> · Wed, 7 Dec 2016 14:43:01 +0100

Hi Laurent!

On Mon 05-12-16 15:00:45, Laurent Dufour wrote:
> I'm sorry to say that but this bug is surfacing again.

Well, thanks for report! ;)

> We got it using the latest Ubuntu 16.04 kernel but I did some test using
> a 4.8 kernel and I was able to get it again.
> It's not easy to recreate, we have to let guest running for a while with
> several disks attached and a database test program which trigger this
> disk is run to get it.
> 
> Here is the stack I got :
> 
> [113031.075540] Unable to handle kernel paging request for data at
> address 0x00000000
> [113031.075614] Faulting instruction address: 0xc0000000003692e0
> 0:mon> t
> [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
> [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
> [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
> [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
> [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
> [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
> [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
> [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
> --- Exception: 0  at 0000000000000000
> 0:mon> e
> cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620]
>     pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290
>     lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590
>     sp: c0000000fb65f8a0
>    msr: 800000010280b033
>    dar: 0
>  dsisr: 40000000
>   current = 0xc0000001d69be400
>   paca    = 0xc000000003480000	 softe: 0	 irq_happened: 0x01
>     pid   = 18689, comm = kworker/u16:10
> Linux version 4.8.0 (laurent@lucky05) (gcc version 5.4.0 20160609
> (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #1 SMP Thu Dec 1 09:25:13 CST 2016
> 0:mon> r
> R00 = c00000000036cb6c   R16 = c0000001fc0312e8
> R01 = c0000000fb65f8a0   R17 = c0000001fc031260
> R02 = c000000001471600   R18 = c0000001fc031350
> R03 = c0000001fc031260   R19 = 0000000000000000
> R04 = c0000001d69beee0   R20 = 0000000000000000
> R05 = 0000000000000000   R21 = c0000000fb65c000
> R06 = 00000001fed50000   R22 = c000000014960b10
> R07 = 00029313df052efd   R23 = c000000014960af0
> R08 = 0000000000000000   R24 = 0000000000000000
> R09 = 0000000000000000   R25 = c0000001fc0312e8
> R10 = 0000000080000000   R26 = 0000000000000000
> R11 = 071c71c71c71c71c   R27 = 0000000000000000
> R12 = 0000000000000000   R28 = 0000000000000001
> R13 = c000000003480000   R29 = c0000001fc031260
> R14 = c0000000000fc3a8   R30 = c0000000fb65fba0
> R15 = 0000000000000000   R31 = 0000000000000000
> pc  = c0000000003692e0 locked_inode_to_wb_and_lock_list+0x50/0x290
> cfar= c0000000000a14d0 lparcfg_data+0xc10/0xda0
> lr  = c00000000036cb6c writeback_sb_inodes+0x30c/0x590
> msr = 800000010280b033   cr  = 24652882
> ctr = c000000000126e20   xer = 0000000020000000   trap =  300
> dar = 0000000000000000   dsisr = 40000000
> 
> The panic is occuring when entering locked_inode_to_wb_and_lock_list(),
> the inode->i_wb field is NULL. I'm almost sure that we didn't loop in
> locked_inode_to_wb_and_lock_list() because the LR register is still
> pointing to the caller.

So this looks like a different problem (although it manifests similarly).
The problem is that for block device inodes, inode_detach_wb() is called
while inode is still alive from __blkdev_put() and flush worker trips over
the inode when inode->i_wb is set to NULL. It requires relatively precise
timing for __blkdev_put() to hit while writeback_sb_inodes() is just
starting to look at the block device inode before it sets I_SYNC on the
inode but it seems the race window is there. Another example why special
block device lifetime rules cause subtle issues. This needs careful thought
how to design the lifetime rules for a block device so that we just don't
keep stacking hacks on hacks... If anyone has clever idea, speak up. I'll
be looking into this once I sort out other outstanding issues so probably
next week...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html