Re: Weird problem: mdadm blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Neil,

thanks for your answer. The only process in 'D' state is the
"ls -al /stor1" I wrote about.

Dmesg has the following complaints:
[40800.776543] INFO: task md127_raid5:18837 blocked for more than 120 seconds. [40800.776546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.776549] md127_raid5 D ffff880077c13780 0 18837 2 0x00000000 [40800.776554] ffff880054d1f1a0 0000000000000046 ffff88003efa3488 ffff88006b6717d0 [40800.776559] 0000000000013780 ffff880043f47fd8 ffff880043f47fd8 ffff880054d1f1a0 [40800.776564] 0000000000000246 ffffffff8134f209 ffff88003734a680 ffff88003734a400
[40800.776569] Call Trace:
[40800.776574]  [<ffffffff8134f209>] ? _raw_spin_lock_irqsave+0x9/0x25
[40800.776585]  [<ffffffffa01756f0>] ? md_super_wait+0x6a/0x80 [md_mod]
[40800.776590]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.776600]  [<ffffffffa0175a88>] ? md_update_sb+0x382/0x474 [md_mod]
[40800.776606]  [<ffffffff8100d02f>] ? load_TLS+0x7/0xa
[40800.776611]  [<ffffffff8100d69f>] ? __switch_to+0x133/0x258
[40800.776621] [<ffffffffa01762f4>] ? md_check_recovery+0x218/0x514 [md_mod]
[40800.776629]  [<ffffffffa0f146fe>] ? raid5d+0x1c/0x483 [raid456]
[40800.776634]  [<ffffffff8134e35b>] ? schedule_timeout+0x2c/0xdb
[40800.776638]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[40800.776642]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[40800.776652]  [<ffffffffa0170256>] ? md_thread+0x114/0x132 [md_mod]
[40800.776657]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.776666]  [<ffffffffa0170142>] ? md_rdev_init+0xea/0xea [md_mod]
[40800.776671]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.776676]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.776681]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.776686]  [<ffffffff81356370>] ? gs_change+0x13/0x13
[40800.776689] INFO: task md127_resync:18866 blocked for more than 120 seconds. [40800.776692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.776695] md127_resync D ffff8800727441c0 0 18866 2 0x00000000 [40800.776701] ffff8800727441c0 0000000000000046 0000000000000000 ffff88001b3848b0 [40800.776705] 0000000000013780 ffff8800716d5fd8 ffff8800716d5fd8 ffff8800727441c0 [40800.776710] 0000000000000000 ffffffff81070fc1 0000000000000046 ffff880054ed9570
[40800.776715] Call Trace:
[40800.776719]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[40800.776726] [<ffffffffa0f0f903>] ? get_active_stripe+0x24c/0x505 [raid456]
[40800.776730]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.776737]  [<ffffffffa0f14dcf>] ? sync_request+0x26a/0x2de [raid456]
[40800.776748]  [<ffffffffa0173581>] ? md_do_sync+0x76b/0xb6f [md_mod]
[40800.776754]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.776763]  [<ffffffffa0170256>] ? md_thread+0x114/0x132 [md_mod]
[40800.776773]  [<ffffffffa0170142>] ? md_rdev_init+0xea/0xea [md_mod]
[40800.776778]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.776782]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.776788]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.776792]  [<ffffffff81356370>] ? gs_change+0x13/0x13
[40800.776797] INFO: task xfsbufd/dm-0:20797 blocked for more than 120 seconds. [40800.776801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.776804] xfsbufd/dm-0 D ffff880077c13780 0 20797 2 0x00000000 [40800.776809] ffff8800757b3550 0000000000000046 ffff880071fedd40 ffff88006b6717d0 [40800.776814] 0000000000013780 ffff88001b12ffd8 ffff88001b12ffd8 ffff8800757b3550 [40800.776819] 0000000000000246 ffffffff8134f209 ffff88003734a680 ffff88003734a400
[40800.776824] Call Trace:
[40800.776828]  [<ffffffff8134f209>] ? _raw_spin_lock_irqsave+0x9/0x25
[40800.776838]  [<ffffffffa0174122>] ? md_write_start+0x133/0x149 [md_mod]
[40800.776844]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.776850]  [<ffffffffa0f11722>] ? make_request+0x36/0x37a [raid456]
[40800.776860] [<ffffffffa0185873>] ? __split_and_process_bio+0x4f4/0x506 [dm_mod]
[40800.776866]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.776875]  [<ffffffffa016fd47>] ? md_make_request+0xee/0x1db [md_mod]
[40800.776881]  [<ffffffff8119908a>] ? generic_make_request+0x90/0xcf
[40800.776885]  [<ffffffff8119919c>] ? submit_bio+0xd3/0xf1
[40800.776890]  [<ffffffff81120e40>] ? bio_alloc_bioset+0x43/0xb6
[40800.776910]  [<ffffffffa0f2be8a>] ? _xfs_buf_ioapply+0x17a/0x1bb [xfs]
[40800.776915]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.776932]  [<ffffffffa0f2c73d>] ? xfs_bdstrat_cb+0x4d/0x51 [xfs]
[40800.776950]  [<ffffffffa0f2bf98>] ? xfs_buf_iorequest+0x62/0x7b [xfs]
[40800.776967]  [<ffffffffa0f2c73d>] ? xfs_bdstrat_cb+0x4d/0x51 [xfs]
[40800.776985]  [<ffffffffa0f2c823>] ? xfsbufd+0xe2/0x114 [xfs]
[40800.776989]  [<ffffffff8134de91>] ? __schedule+0x5f9/0x610
[40800.777007]  [<ffffffffa0f2c741>] ? xfs_bdstrat_cb+0x51/0x51 [xfs]
[40800.777012]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.777017]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.777023]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.777027]  [<ffffffff81356370>] ? gs_change+0x13/0x13
[40800.777030] INFO: task xfsaild/dm-0:20798 blocked for more than 120 seconds. [40800.777034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.777037] xfsaild/dm-0 D ffff88003754c300 0 20798 2 0x00000000 [40800.777042] ffff88003754c300 0000000000000046 0000000000000000 ffff88005d7c51a0 [40800.777047] 0000000000013780 ffff88001b01ffd8 ffff88001b01ffd8 ffff88003754c300 [40800.777052] ffff880071fede00 ffffffff81070fc1 0000000000000046 ffff88003734a400
[40800.777057] Call Trace:
[40800.777061]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[40800.777071]  [<ffffffffa0170466>] ? md_flush_request+0x96/0x111 [md_mod]
[40800.777076]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.777082]  [<ffffffffa0f11711>] ? make_request+0x25/0x37a [raid456]
[40800.777091] [<ffffffffa0185873>] ? __split_and_process_bio+0x4f4/0x506 [dm_mod]
[40800.777096]  [<ffffffff8103720c>] ? test_tsk_need_resched+0xa/0x13
[40800.777101]  [<ffffffff8103afb6>] ? check_preempt_curr+0x52/0x5f
[40800.777106]  [<ffffffff8103b013>] ? ttwu_do_wakeup+0x50/0xc4
[40800.777116]  [<ffffffffa016fd47>] ? md_make_request+0xee/0x1db [md_mod]
[40800.777121]  [<ffffffff8119908a>] ? generic_make_request+0x90/0xcf
[40800.777126]  [<ffffffff8119919c>] ? submit_bio+0xd3/0xf1
[40800.777131]  [<ffffffff81120e66>] ? bio_alloc_bioset+0x69/0xb6
[40800.777149]  [<ffffffffa0f2be8a>] ? _xfs_buf_ioapply+0x17a/0x1bb [xfs]
[40800.777154]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.777179]  [<ffffffffa0f6ba3a>] ? xlog_bdstrat+0x34/0x38 [xfs]
[40800.777196]  [<ffffffffa0f2bf98>] ? xfs_buf_iorequest+0x62/0x7b [xfs]
[40800.777221]  [<ffffffffa0f6ba3a>] ? xlog_bdstrat+0x34/0x38 [xfs]
[40800.777245]  [<ffffffffa0f6c7cc>] ? xlog_sync+0x1dd/0x2d4 [xfs]
[40800.777269]  [<ffffffffa0f70cd0>] ? xfs_ail_min_lsn+0xd/0x2b [xfs]
[40800.777294]  [<ffffffffa0f6db4b>] ? xlog_write+0x348/0x545 [xfs]
[40800.777316]  [<ffffffffa0f3cd86>] ? kmem_zone_zalloc+0x1b/0x2d [xfs]
[40800.777341]  [<ffffffffa0f6ed2b>] ? xlog_cil_push+0x1e5/0x2fb [xfs]
[40800.777366]  [<ffffffffa0f6f351>] ? xlog_cil_force_lsn+0x1d/0x86 [xfs]
[40800.777391]  [<ffffffffa0f6ded3>] ? _xfs_log_force+0x4e/0x1ae [xfs]
[40800.777416]  [<ffffffffa0f6e03e>] ? xfs_log_force+0xb/0x2c [xfs]
[40800.777440]  [<ffffffffa0f70eae>] ? xfsaild+0xf4/0x46b [xfs]
[40800.777465] [<ffffffffa0f70dba>] ? xfs_trans_ail_cursor_first+0x79/0x79 [xfs]
[40800.777470]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.777475]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.777481]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.777485]  [<ffffffff81356370>] ? gs_change+0x13/0x13
[40800.777489] INFO: task flush-253:0:20958 blocked for more than 120 seconds. [40800.777492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.777495] flush-253:0 D ffff880077c13780 0 20958 2 0x00000000 [40800.777500] ffff88001b7ecf20 0000000000000046 0000000000011200 ffff8800727441c0 [40800.777505] 0000000000013780 ffff88001b0b3fd8 ffff88001b0b3fd8 ffff88001b7ecf20 [40800.777510] 0000000000000246 ffffffff8134f209 ffff88003734a680 ffff88003734a400
[40800.777515] Call Trace:
[40800.777519]  [<ffffffff8134f209>] ? _raw_spin_lock_irqsave+0x9/0x25
[40800.777531]  [<ffffffffa0174122>] ? md_write_start+0x133/0x149 [md_mod]
[40800.777536]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.777542]  [<ffffffffa0f11722>] ? make_request+0x36/0x37a [raid456]
[40800.777552] [<ffffffffa0185873>] ? __split_and_process_bio+0x4f4/0x506 [dm_mod]
[40800.777562]  [<ffffffffa016fd47>] ? md_make_request+0xee/0x1db [md_mod]
[40800.777567]  [<ffffffff8119908a>] ? generic_make_request+0x90/0xcf
[40800.777572]  [<ffffffff8119919c>] ? submit_bio+0xd3/0xf1
[40800.777577]  [<ffffffff811171c5>] ? __mark_inode_dirty+0x58/0x17a
[40800.777595]  [<ffffffffa0f2a4aa>] ? xfs_submit_ioend+0x99/0xd9 [xfs]
[40800.777612]  [<ffffffffa0f2a852>] ? xfs_vm_writepage+0x368/0x3e1 [xfs]
[40800.777618]  [<ffffffff810bc31a>] ? __writepage+0xa/0x21
[40800.777622]  [<ffffffff810bc1a2>] ? write_cache_pages+0x1f8/0x2e9
[40800.777628]  [<ffffffff810bc310>] ? set_page_dirty_lock+0x2b/0x2b
[40800.777633]  [<ffffffff810bc2cd>] ? generic_writepages+0x3a/0x52
[40800.777639]  [<ffffffff811183f3>] ? writeback_single_inode+0x11d/0x2cc
[40800.777644]  [<ffffffff81118873>] ? writeback_sb_inodes+0x16b/0x204
[40800.777650]  [<ffffffff81118979>] ? __writeback_inodes_wb+0x6d/0xab
[40800.777655]  [<ffffffff81118adf>] ? wb_writeback+0x128/0x21f
[40800.777660]  [<ffffffff810bc628>] ? determine_dirtyable_memory+0x10/0x17
[40800.777665]  [<ffffffff81118fd9>] ? wb_do_writeback+0x189/0x1a8
[40800.777671]  [<ffffffff8111907d>] ? bdi_writeback_thread+0x85/0x1e6
[40800.777676]  [<ffffffff81118ff8>] ? wb_do_writeback+0x1a8/0x1a8
[40800.777681]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.777686]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.777691]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.777695]  [<ffffffff81356370>] ? gs_change+0x13/0x13
[40800.777703] INFO: task BackupPC_dump:8965 blocked for more than 120 seconds. [40800.777706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.777710] BackupPC_dump D ffff88005d7c51a0 0 8965 8846 0x00000000 [40800.777715] ffff88005d7c51a0 0000000000000086 ffff88005e635b40 ffff88006b2a82c0 [40800.777720] 0000000000013780 ffff8800596c9fd8 ffff8800596c9fd8 ffff88005d7c51a0 [40800.777725] 0000000000000000 0000000000000000 ffff880050f9bd40 7fffffffffffffff
[40800.777730] Call Trace:
[40800.777734]  [<ffffffff8134e35b>] ? schedule_timeout+0x2c/0xdb
[40800.777739]  [<ffffffff811aa999>] ? _atomic_dec_and_lock+0x1/0x48
[40800.777744]  [<ffffffff8134dfa1>] ? wait_for_common+0xa0/0x119
[40800.777748]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.777766]  [<ffffffffa0f2c0fb>] ? xfs_buf_read+0x88/0xbe [xfs]
[40800.777791]  [<ffffffffa0f71a39>] ? xfs_trans_read_buf+0x4a/0x310 [xfs]
[40800.777808]  [<ffffffffa0f2bff5>] ? xfs_buf_iowait+0x44/0x81 [xfs]
[40800.777826]  [<ffffffffa0f2c0fb>] ? xfs_buf_read+0x88/0xbe [xfs]
[40800.777850]  [<ffffffffa0f71a39>] ? xfs_trans_read_buf+0x4a/0x310 [xfs]
[40800.777875]  [<ffffffffa0f5fb1c>] ? xfs_imap_to_bp+0x40/0x100 [xfs]
[40800.777899]  [<ffffffffa0f632cd>] ? xfs_iread+0x54/0x177 [xfs]
[40800.777917]  [<ffffffffa0f30743>] ? xfs_inode_alloc+0x73/0xe9 [xfs]
[40800.777936]  [<ffffffffa0f30ec2>] ? xfs_iget+0x37c/0x56c [xfs]
[40800.777958]  [<ffffffffa0f3b3b4>] ? xfs_lookup+0xa4/0xd3 [xfs]
[40800.777977]  [<ffffffffa0f33e5a>] ? xfs_vn_lookup+0x3f/0x7e [xfs]
[40800.777983]  [<ffffffff81102709>] ? d_alloc_and_lookup+0x3a/0x60
[40800.777988]  [<ffffffff811031ad>] ? walk_component+0x219/0x406
[40800.777993]  [<ffffffff811039e1>] ? link_path_walk+0x174/0x421
[40800.777998]  [<ffffffff81104018>] ? path_lookupat+0x53/0x2bd
[40800.778002]  [<ffffffff81036628>] ? should_resched+0x5/0x23
[40800.778006]  [<ffffffff81036628>] ? should_resched+0x5/0x23
[40800.778010]  [<ffffffff8134deec>] ? _cond_resched+0x7/0x1c
[40800.778014]  [<ffffffff8110429e>] ? do_path_lookup+0x1c/0x87
[40800.778019]  [<ffffffff81105d27>] ? user_path_at_empty+0x47/0x7b
[40800.778024]  [<ffffffff811b0604>] ? timerqueue_add+0x80/0xa0
[40800.778029]  [<ffffffff810380d3>] ? set_next_entity+0x32/0x55
[40800.778034]  [<ffffffff8100d751>] ? __switch_to+0x1e5/0x258
[40800.778039]  [<ffffffff810fdd7a>] ? vfs_fstatat+0x32/0x60
[40800.778043]  [<ffffffff810fdeb0>] ? sys_newstat+0x12/0x2b
[40800.778048]  [<ffffffff81354212>] ? system_call_fastpath+0x16/0x1b
[40800.778053] INFO: task kworker/0:0:12935 blocked for more than 120 seconds. [40800.778056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.778059] kworker/0:0 D ffff880077c13780 0 12935 2 0x00000000 [40800.778064] ffff88001b0c4f60 0000000000000046 ffffffff81788740 ffff88006b7e6340 [40800.778069] 0000000000013780 ffff880000cabfd8 ffff880000cabfd8 ffff88001b0c4f60 [40800.778074] 0000000000000246 ffffffff8134f209 ffff88003734a680 ffff88003734a400
[40800.778079] Call Trace:
[40800.778083]  [<ffffffff8134f209>] ? _raw_spin_lock_irqsave+0x9/0x25
[40800.778095]  [<ffffffffa0174122>] ? md_write_start+0x133/0x149 [md_mod]
[40800.778100]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[40800.778106]  [<ffffffffa0f11722>] ? make_request+0x36/0x37a [raid456]
[40800.778111]  [<ffffffff810ece31>] ? kmem_cache_alloc+0x86/0xea
[40800.778121]  [<ffffffffa016fd47>] ? md_make_request+0xee/0x1db [md_mod]
[40800.778126]  [<ffffffff8119908a>] ? generic_make_request+0x90/0xcf
[40800.778135] [<ffffffffa01855e4>] ? __split_and_process_bio+0x265/0x506 [dm_mod]
[40800.778140]  [<ffffffff8134f247>] ? _raw_spin_unlock_irqrestore+0xe/0xf
[40800.778144]  [<ffffffff8103f6b4>] ? try_to_wake_up+0x187/0x197
[40800.778154]  [<ffffffffa0185a6e>] ? dm_wq_work+0x8c/0xab [dm_mod]
[40800.778158]  [<ffffffff8105b529>] ? process_one_work+0x161/0x269
[40800.778163]  [<ffffffff8105c4f2>] ? worker_thread+0xc2/0x145
[40800.778167]  [<ffffffff8105c430>] ? manage_workers.isra.25+0x15b/0x15b
[40800.778172]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.778177]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.778182]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.778186]  [<ffffffff81356370>] ? gs_change+0x13/0x13
[40800.778190] INFO: task kworker/0:2:19557 blocked for more than 120 seconds. [40800.778193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [40800.778196] kworker/0:2 D ffff880077c13780 0 19557 2 0x00000000 [40800.778202] ffff88006b7e6340 0000000000000046 0000000000000000 ffff880054d1f1a0 [40800.778206] 0000000000013780 ffff88000005ffd8 ffff88000005ffd8 ffff88006b7e6340 [40800.778211] ffff880071fede00 ffffffff81070fc1 0000000000000046 ffff88003734a400
[40800.778216] Call Trace:
[40800.778221]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[40800.778230]  [<ffffffffa0170466>] ? md_flush_request+0x96/0x111 [md_mod]
[40800.778235]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.778241]  [<ffffffffa0f11711>] ? make_request+0x25/0x37a [raid456]
[40800.778250] [<ffffffffa0185873>] ? __split_and_process_bio+0x4f4/0x506 [dm_mod]
[40800.778255]  [<ffffffff8103720c>] ? test_tsk_need_resched+0xa/0x13
[40800.778260]  [<ffffffff8103afb6>] ? check_preempt_curr+0x52/0x5f
[40800.778264]  [<ffffffff8103b013>] ? ttwu_do_wakeup+0x50/0xc4
[40800.778274]  [<ffffffffa016fd47>] ? md_make_request+0xee/0x1db [md_mod]
[40800.778279]  [<ffffffff8119908a>] ? generic_make_request+0x90/0xcf
[40800.778284]  [<ffffffff8119919c>] ? submit_bio+0xd3/0xf1
[40800.778289]  [<ffffffff81120e66>] ? bio_alloc_bioset+0x69/0xb6
[40800.778308]  [<ffffffffa0f2be8a>] ? _xfs_buf_ioapply+0x17a/0x1bb [xfs]
[40800.778312]  [<ffffffff8103f6c4>] ? try_to_wake_up+0x197/0x197
[40800.778337]  [<ffffffffa0f6ba3a>] ? xlog_bdstrat+0x34/0x38 [xfs]
[40800.778354]  [<ffffffffa0f2bf98>] ? xfs_buf_iorequest+0x62/0x7b [xfs]
[40800.778379]  [<ffffffffa0f6ba3a>] ? xlog_bdstrat+0x34/0x38 [xfs]
[40800.778403]  [<ffffffffa0f6c7cc>] ? xlog_sync+0x1dd/0x2d4 [xfs]
[40800.778428]  [<ffffffffa0f70cd0>] ? xfs_ail_min_lsn+0xd/0x2b [xfs]
[40800.778452]  [<ffffffffa0f6db4b>] ? xlog_write+0x348/0x545 [xfs]
[40800.778477]  [<ffffffffa0f6ed2b>] ? xlog_cil_push+0x1e5/0x2fb [xfs]
[40800.778482]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[40800.778507]  [<ffffffffa0f6f351>] ? xlog_cil_force_lsn+0x1d/0x86 [xfs]
[40800.778531]  [<ffffffffa0f6e0c2>] ? _xfs_log_force_lsn+0x63/0x205 [xfs]
[40800.778556]  [<ffffffffa0f6b502>] ? xfs_trans_commit+0x10a/0x205 [xfs]
[40800.778577]  [<ffffffffa0f387d4>] ? xfs_sync_worker+0x3a/0x6a [xfs]
[40800.778581]  [<ffffffff8105b529>] ? process_one_work+0x161/0x269
[40800.778586]  [<ffffffff8105c4f2>] ? worker_thread+0xc2/0x145
[40800.778590]  [<ffffffff8105c430>] ? manage_workers.isra.25+0x15b/0x15b
[40800.778595]  [<ffffffff8105f631>] ? kthread+0x76/0x7e
[40800.778600]  [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[40800.778605]  [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[40800.778609]  [<ffffffff81356370>] ? gs_change+0x13/0x13

Regards, Hans

Am 22.12.2013 12:19, schrieb NeilBrown:
On Sun, 22 Dec 2013 10:01:26 +0100 Hans Kraus <hans@xxxxxxxxxxxxxx> wrote:

Hi,

my backup system (running backuppc) has developed a weird problem:
calls from command line "mdadm --detail /dev/mdX" block, for every
existing raid on the system, and can be only terminated with ^C. This
is true even for the newest mdadm built from git.

"cat /proc/mdstat" blocks too. All mounted raids are working (at least
ls <mountpoint> is), exept for one, md127 (the storage of backuppc).
There ls is blocking and is not terminable by ^C.

The raid structure is the following:
md2, md3, m4	raid1 for swap, /boot, /
md30		raid0 for short term storage
md10, md11, md12, md13	raid0, built from 2x 2TB or 1TB + 3TB drives
md127		raid5 built from md10, md11, md12, md13

I recently (some 12 hours ago) added md13 again and the system was
rebuilding from a degraded state. The file system on md127 is xfs. All
the physical rives are OK, at least according to smartmontools.
Webmin 1.660 reports:
CPU load averages 	16.96 (1 min) 15.04 (5 mins) 12.67 (15 mins)
CPU usage 		0% user, 1% kernel, 99% IO, 0% idle

Is there any way to diagnose the problem further? I'm reluctant to
do a reboot.

Either some process has crashed leaving an 'oops' or 'bug' message in the
kernel logs, or some process is stuck in 'D' state in 'ps'.

So:
  1/ look through kernel logs since boot (e.g. output of 'dmesg', though that
    might not be complete) for anything  unusual - there should be a stack
    trace.
  2/ if there is a process in 'D' state, find how which and get a stack trace
     of it.  Possibly by
      echo w > /proc/sysrq-trigger
    or
      cat /proc/$PID/stack
    or event
      echo t > /proc/sysrq-trigger
    (though that might create lots of output that might be hard to capture).

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux