[Bug 201331] deadlock (XFS?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=201331

--- Comment #5 from Dave Chinner (david@xxxxxxxxxxxxx) ---
On Thu, Oct 04, 2018 at 11:25:49PM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201331

> 
> --- Comment #4 from edo (edo.rus@xxxxxxxxx) ---
> I tested with 4.17 and 4.18 prebuilt Debian kernels, behavior is the same:
> Sep 30 16:01:23 storage10x10n1 kernel: [23683.218388] INFO: task
> kworker/u24:0:21848 blocked for more than 120 seconds.

I think we need to rename XFS to "The Messenger: Please don't shoot
me"... :)

>From the xfs_info:

sunit=4096   swidth=32768 blks

Ok, that looks wrong - why do you have a MD raid device with
16MB stripe unit and a 128MB stripe width?

Yup:

md3 : active raid6 sda4[0] sdj4[9] sdg4[6] sdd4[3] sdi4[8] sdf4[5] sde4[4]
sdh4[7] sdb4[2] sdc4[1]
      77555695616 blocks super 1.2 level 6, 16384k chunk, algorithm 2 [10/10]
[UUUUUUUUUU]
            bitmap: 9/73 pages [36KB], 65536KB chunk

You've configured your RAID6 device with a 16MB chunk size, which
gives the XFS su/sw noted above.

Basically, your RMW'd your RAID device to death because every write
is a sub-stripe write.

>  Workqueue: writeback wb_workfn (flush-9:3)
>  Call Trace:
>   schedule+0x32/0x80
>  bitmap_startwrite+0x161/0x1e0 [md_mod]

MD blocks here when it has too many inflight bitmap updates and so
waits for IO to complete before starting another. This isn't XFS
filesystem IO - this in internal MD RAID consistency information
that it needs to write for crash recovery purposes.

This will be a direct result of the raid device configuration....

>  add_stripe_bio+0x441/0x7d0 [raid456]
>  raid5_make_request+0x1ae/0xb10 [raid456]
>  md_handle_request+0x116/0x190 [md_mod]
>  md_make_request+0x65/0x160 [md_mod]
>  generic_make_request+0x1e7/0x410
>   submit_bio+0x6c/0x140
>  xfs_add_to_ioend+0x14c/0x280 [xfs]
>  xfs_do_writepage+0x2bb/0x680 [xfs]
>  write_cache_pages+0x1ed/0x430
>  xfs_vm_writepages+0x64/0xa0 [xfs]
>   do_writepages+0x1a/0x60
>  __writeback_single_inode+0x3d/0x320
>  writeback_sb_inodes+0x221/0x4b0
>  __writeback_inodes_wb+0x87/0xb0
>   wb_writeback+0x288/0x320
>   wb_workfn+0x37c/0x450

... and this is just the writeback path - your problem has nothing
do with XFS...

Cheers,

Dave.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux