vgdisplay hang on iSCSI session

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

We've got some "strange" issue on a Xen hypervisor with CentOS 6 and 4.9.63-29.el6.x86_6 kernel. The system has a local raid + is connected with 2 iscsi sessions to 3 disks with multipath (6 blockdevs in total).

We've noticed that vgdisplay was hanging, and the kernel was printing the following message: Feb 3 09:43:27 srv01 kernel: [3074775.157416] INFO: task vgdisplay:27430 blocked for more than 120 seconds. Feb 3 09:43:27 srv01 kernel: [3074775.157507] Not tainted 4.9.63-29.el6.x86_64 #1 Feb 3 09:43:27 srv01 kernel: [3074775.157541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 3 09:43:27 srv01 kernel: [3074775.157591] vgdisplay D 0 27430 27429 0x00000080 Feb 3 09:43:27 srv01 kernel: [3074775.157599] ffff880263017800 0000000000000000 ffff880244bdc540 ffff88026cd08380 Feb 3 09:43:27 srv01 kernel: [3074775.157613] ffff88027de59880 ffffc90063effba8 ffffffff818daab0 ffffc90063effbb8 Feb 3 09:43:27 srv01 kernel: [3074775.157616] ffffffff81114992 ffffc90063effba8 ffffffff818def5f ffffc90063effb78
Feb  3 09:43:27 srv01 kernel: [3074775.157620] Call Trace:
Feb 3 09:43:27 srv01 kernel: [3074775.157638] [<ffffffff818daab0>] ? __schedule+0x230/0x530 Feb 3 09:43:27 srv01 kernel: [3074775.157644] [<ffffffff81114992>] ? __call_rcu+0x132/0x220 Feb 3 09:43:27 srv01 kernel: [3074775.157648] [<ffffffff818def5f>] ? _raw_spin_lock_irqsave+0x1f/0x50 Feb 3 09:43:27 srv01 kernel: [3074775.157655] [<ffffffff818dae9a>] schedule+0x3a/0xa0 Feb 3 09:43:27 srv01 kernel: [3074775.157658] [<ffffffff81114aed>] ? call_rcu_sched+0x1d/0x20 Feb 3 09:43:27 srv01 kernel: [3074775.157668] [<ffffffff813fc89f>] blk_mq_freeze_queue_wait+0x6f/0xd0 Feb 3 09:43:27 srv01 kernel: [3074775.157676] [<ffffffff810ede80>] ? woken_wake_function+0x20/0x20 Feb 3 09:43:27 srv01 kernel: [3074775.157679] [<ffffffff818dec86>] ? _raw_spin_unlock_irqrestore+0x16/0x20 Feb 3 09:43:27 srv01 kernel: [3074775.157685] [<ffffffff8143b053>] ? percpu_ref_kill_and_confirm+0x63/0xd0 Feb 3 09:43:27 srv01 kernel: [3074775.157688] [<ffffffff813fe77c>] ? blk_mq_run_hw_queues+0x4c/0x90 Feb 3 09:43:27 srv01 kernel: [3074775.157691] [<ffffffff813fe82e>] blk_freeze_queue+0x1e/0x30 Feb 3 09:43:27 srv01 kernel: [3074775.157694] [<ffffffff813fe84e>] blk_mq_freeze_queue+0xe/0x10 Feb 3 09:43:27 srv01 kernel: [3074775.157700] [<ffffffff815fa62e>] loop_switch+0x1e/0xd0 Feb 3 09:43:27 srv01 kernel: [3074775.157703] [<ffffffff815fb47a>] lo_release+0x7a/0x80 Feb 3 09:43:27 srv01 kernel: [3074775.157712] [<ffffffff812a7927>] __blkdev_put+0x1a7/0x200 Feb 3 09:43:27 srv01 kernel: [3074775.157734] [<ffffffff8123ddd8>] ? cache_free_debugcheck+0x188/0x240 Feb 3 09:43:27 srv01 kernel: [3074775.157738] [<ffffffff812a79d6>] blkdev_put+0x56/0x140 Feb 3 09:43:27 srv01 kernel: [3074775.157742] [<ffffffff812a7ae4>] blkdev_close+0x24/0x30 Feb 3 09:43:27 srv01 kernel: [3074775.157750] [<ffffffff8126baf8>] __fput+0xc8/0x240 Feb 3 09:43:27 srv01 kernel: [3074775.157753] [<ffffffff8126bd1e>] ____fput+0xe/0x10 Feb 3 09:43:27 srv01 kernel: [3074775.157760] [<ffffffff810c5398>] task_work_run+0x68/0xa0 Feb 3 09:43:27 srv01 kernel: [3074775.157767] [<ffffffff81003546>] exit_to_usermode_loop+0xc6/0xd0 Feb 3 09:43:27 srv01 kernel: [3074775.157770] [<ffffffff81003f85>] do_syscall_64+0x185/0x240 Feb 3 09:43:27 srv01 kernel: [3074775.157774] [<ffffffff818df22b>] entry_SYSCALL64_slow_path+0x25/0x25


Now if I did an strace -f vgdisplay, it did hang on the following call:
stat("/dev/loop0", {st_mode=S_IFBLK|0660, st_rdev=makedev(7, 0), ...}) = 0
open("/dev/loop0", O_RDONLY|O_DIRECT|O_NOATIME

And did hang there forever.

multipath -l didn't show any issues on the paths!
But multipath -ll did hang on the following syscall:
close(8)                                = 0
close(9)                                = 0
close(10)                               = 0
fcntl(11, F_GETFL) = 0xc000 (flags O_RDONLY|O_DIRECT|O_LARGEFILE)
fcntl(11, F_SETFL, O_RDONLY|O_LARGEFILE) = 0
io_destroy(139682352197632

Also I tried to rescan the blockdevs via "echo 1 > /sys/block/sdx/device/rescan", and they all worked, except sdf ... (which is one of the iscsi blockdevs).
But the VM's using the storage, were still online! So I/O was not dead.

Ofcourse, sdf was the enabled (not active) device member of the multipath dev: size=5.0T features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=14 status=active
| `- 12:0:0:3 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=9 status=enabled
  `- 11:0:0:3 sdf 8:80 active ready running


Also I see 1 kworker was in D-state:
root 13820 0.0 0.0 0 0 ? D Feb03 0:03 \_ [kworker/1:2]

Next to the tons of vgdisplay calls:
root 27430 0.0 0.0 127528 5000 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 28349 0.0 0.0 127528 5152 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 28868 0.0 0.0 127528 5160 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 29398 0.0 0.0 127528 4896 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 29708 0.0 0.0 127528 4960 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 30223 0.0 0.0 127528 5108 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 30529 0.0 0.0 127528 4916 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 30831 0.0 0.0 127528 4988 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure root 31344 0.0 0.0 127528 4976 ? D Feb03 0:00 \_ vgdisplay -c --ignorelockingfailure


I already upgraded to 4.9.75-30.el6.x86_64, but its ofcourse essential to know what causes this issue.
It was fine for +1 month before this happend by the way!

Any idea's on this?

Thanks
Jean-Louis



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux