On Wednesday January 24, jpiszcz@xxxxxxxxxxxxxxx wrote: > Here you go Neil: > > p34:~# echo 512 > /sys/block/md3/md/stripe_cache_size > p34:~# echo 1024 > /sys/block/md3/md/stripe_cache_size > p34:~# echo 2048 > /sys/block/md3/md/stripe_cache_size > p34:~# echo 4096 > /sys/block/md3/md/stripe_cache_size > p34:~# echo 8192 > /sys/block/md3/md/stripe_cache_size > <...... FROZEN ........> > > I ran echo t > /proc/sysrq-trigger and then copied the relevant parts of > kern.log and I am attaching them to this e-mail. > > Please confirm this is what you needed. Perfect. Thanks. This bit: 574 Jan 24 18:22:21 p34 kernel: [273475.825645] bash D C7BEBAAC 0 16821 16820 (NOTLB) 575 Jan 24 18:22:21 p34 kernel: [273475.825653] c7bebac0 00000082 00000002 c7bebaac c7bebaa8 00000000 5b48e428 c6cdc560 576 Jan 24 18:22:21 p34 kernel: [273475.825665] c7bebad8 00010b03 00000011 00000009 cb093a53 0000f8b2 00017216 c6cdc66c 577 Jan 24 18:22:21 p34 kernel: [273475.825838] c1fe3280 00000001 c20c70c0 c3272058 f75c4a80 c7bebad8 c016a258 f7b12520 578 Jan 24 18:22:21 p34 kernel: [273475.825850] Call Trace: 579 Jan 24 18:22:21 p34 kernel: [273475.825853] [<c016a258>] dput+0x18/0x150 580 Jan 24 18:22:21 p34 kernel: [273475.825857] [<c0161f84>] __link_path_walk+0xb04/0xc90 581 Jan 24 18:22:21 p34 kernel: [273475.825862] [<c03600ad>] md_write_start+0x8d/0x120 582 Jan 24 18:22:21 p34 kernel: [273475.825867] [<c012eac0>] autoremove_wake_function+0x0/0x50 583 Jan 24 18:22:21 p34 kernel: [273475.825871] [<c03557a8>] make_request+0x38/0x560 584 Jan 24 18:22:21 p34 kernel: [273475.825876] [<c02409ce>] xfs_log_move_tail+0x3e/0x1b0 585 Jan 24 18:22:21 p34 kernel: [273475.825881] [<c023c9fa>] xfs_iomap+0x2ca/0x720 586 Jan 24 18:22:21 p34 kernel: [273475.825885] [<c026d77a>] generic_make_request+0xda/0x150 587 Jan 24 18:22:21 p34 kernel: [273475.825890] [<c026fe32>] submit_bio+0x72/0x110 588 Jan 24 18:22:21 p34 kernel: [273475.825895] [<c013da6b>] mempool_alloc+0x2b/0xf0 589 Jan 24 18:22:21 p34 kernel: [273475.825899] [<c034f1a0>] raid5_mergeable_bvec+0x0/0x90 590 Jan 24 18:22:21 p34 kernel: [273475.825904] [<c017c052>] __bio_add_page+0x102/0x190 591 Jan 24 18:22:21 p34 kernel: [273475.825909] [<c017c117>] bio_add_page+0x37/0x50 592 Jan 24 18:22:21 p34 kernel: [273475.826073] [<c025be8b>] xfs_submit_ioend_bio+0x1b/0x30 593 Jan 24 18:22:21 p34 kernel: [273475.826078] [<c025c10e>] xfs_page_state_convert+0x26e/0xff0 594 Jan 24 18:22:21 p34 kernel: [273475.826082] [<c0155509>] slab_destroy+0x59/0x90 595 Jan 24 18:22:21 p34 kernel: [273475.826088] [<c025d102>] xfs_vm_writepage+0x62/0x100 596 Jan 24 18:22:21 p34 kernel: [273475.826092] [<c014396d>] shrink_inactive_list+0x5dd/0x8a0 597 Jan 24 18:22:21 p34 kernel: [273475.826097] [<c0143cd1>] shrink_zone+0xa1/0x100 598 Jan 24 18:22:21 p34 kernel: [273475.826102] [<c01447e0>] try_to_free_pages+0x140/0x260 599 Jan 24 18:22:21 p34 kernel: [273475.826106] [<c013fb4f>] __alloc_pages+0x13f/0x2f0 600 Jan 24 18:22:21 p34 kernel: [273475.826111] [<c0350dd3>] grow_one_stripe+0x93/0x100 601 Jan 24 18:22:21 p34 kernel: [273475.826115] [<c0350ee6>] raid5_store_stripe_cache_size+0xa6/0xc0 602 Jan 24 18:22:21 p34 kernel: [273475.826120] [<c0361a83>] md_attr_store+0x73/0x90 603 Jan 24 18:22:21 p34 kernel: [273475.826125] [<c0192302>] sysfs_write_file+0xa2/0x100 604 Jan 24 18:22:21 p34 kernel: [273475.826129] [<c01595f6>] vfs_write+0xa6/0x160 605 Jan 24 18:22:21 p34 kernel: [273475.826134] [<c0192260>] sysfs_write_file+0x0/0x100 606 Jan 24 18:22:21 p34 kernel: [273475.826138] [<c0159d31>] sys_write+0x41/0x70 607 Jan 24 18:22:21 p34 kernel: [273475.826303] [<c0103138>] syscall_call+0x7/0xb 608 Jan 24 18:22:21 p34 kernel: [273475.826307] ======================= Tells me what is happening. We try to allocate memory to increase the stripe cache (__alloc_pages) which requires memory to be freed, so shrink_zone gets called which calls into the 'xfs' filesystem which eventually trying to write to the raid5 array. The raid5 array is currently 'clean' so we need to mark the superblock as dirty first (md_write_start), but that needs a lock that is being held while we grow the stripe cache. Deadlock. So the patch I posted (changing GFP_KERNEL to GFP_NOIO) will avoid this as it will then fail the allocation rather than initiate IO. However it might be better if I can find a way to avoid the deadlock.... I'll see what I can come up with. Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html