Hello:
I had a customer that tried to do a pvmove and it hung. So we setup a test system to try and duplicate the problem and were able to.
A little history and why I am asking the question in this list. The customer needed to move from an existing SAN to a new SAN and wanted as little as possible down time for the Application. So they zoned the new SAN for access by the system and then added the new LUNs to the existing Volume Group. Then ran the pvmove commands. It worked with no problem on one of the PVs, but on the second one all the I/O hung at the Application and any commands that access the LVM information such as vgdisplay.
On our test system we only have 1 SAN (EMC CX700). We put X number of LUNs in a Volume Group and allocated Logical Volumes for the Application. Added some more LUNs to the Volume Group to simulate a second SAN. Started the Application with a test program to generate I/O. Ran pvmove with no problems on one PV, but on the second PV, it hung just like on the customer's system.
The reason I am posting to this list is because the same type of move was done earlier on the test system running PowerPath and did not have any problems. The OS is Red Hat EL 5.5 32 bit. The same version of LVM was used on both tests. I can provide other details if needed.
Below is part of the messages file when this happen.
Aug 13 14:18:14 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:18:14 mss121 multipathd: dm-19: add map (uevent)
Aug 13 14:18:14 mss121 multipathd: dm-22: add map (uevent)
Aug 13 14:19:53 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:19:53 mss121 multipathd: dm-19: add map (uevent)
Aug 13 14:19:53 mss121 multipathd: dm-22: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-19: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-22: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-25: remove map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-17: add map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-20: add map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-23: add map (uevent)
Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more
than 120 seconds.
Aug 13 14:27:22 mss121 kernel: "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 13 14:27:22 mss121 kernel: mpdsk D 00000BD7 1784 22158
22151 22159 22157 (NOTLB)
Aug 13 14:27:22 mss121 kernel: f3025e04 00000082 bde60e11
00000bd7 f3025e50 c045d1a9 f3025e50 0000000a
Aug 13 14:27:22 mss121 kernel: f7c60000 bde67e2a 00000bd7
00007019 00000000 f7c6010c c8612700 f6ec4e40
Aug 13 14:27:22 mss121 kernel: 00000000 00000000 00000000
c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff
Aug 13 14:27:22 mss121 kernel: Call Trace:
Aug 13 14:27:22 mss121 kernel: [<c045d1a9>] __pagevec_release+0x15/0x1d
Aug 13 14:27:22 mss121 kernel: [<c042cbd1>] getnstimeofday+0x30/0xb6
Aug 13 14:27:22 mss121 kernel: [<c061c156>] io_schedule+0x36/0x59
Aug 13 14:27:22 mss121 kernel: [<c04569c0>] sync_page+0x38/0x3b
Aug 13 14:27:22 mss121 kernel: [<c061c32d>] __wait_on_bit+0x33/0x58
Aug 13 14:27:22 mss121 kernel: [<c0456988>] sync_page+0x0/0x3b
Aug 13 14:27:22 mss121 kernel: [<c0456a48>] wait_on_page_bit+0x5b/0x62
Aug 13 14:27:22 mss121 kernel: [<c043642c>] wake_bit_function+0x0/0x3c
Aug 13 14:27:22 mss121 kernel: [<c04573cf>]
wait_on_page_writeback_range+0x4d/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04934a0>] generic_osync_inode+0x93/0xbf
Aug 13 14:27:22 mss121 kernel: [<c0457618>]
sync_page_range_nolock+0x68/0x93
Aug 13 14:27:22 mss121 kernel: [<c0458930>]
generic_file_aio_write_nolock+0x71/0x83
Aug 13 14:27:22 mss121 kernel: [<c047b301>] blkdev_file_write+0x0/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0458c8d>]
generic_file_write_nolock+0x86/0x9a
Aug 13 14:27:22 mss121 kernel: [<c04566fe>] find_get_pages_tag+0x30/0x75
Aug 13 14:27:22 mss121 kernel: [<c0457428>]
wait_on_page_writeback_range+0xa6/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04363ff>]
autoremove_wake_function+0x0/0x2d
Aug 13 14:27:22 mss121 kernel: [<c061c408>] mutex_lock+0xb/0x19
Aug 13 14:27:22 mss121 kernel: [<c0449c52>] audit_syscall_entry+0x15a/0x18c
Aug 13 14:27:22 mss121 kernel: [<c047b31b>] blkdev_file_write+0x1a/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0474d53>] vfs_write+0xa1/0x143
Aug 13 14:27:22 mss121 kernel: [<c0475345>] sys_write+0x3c/0x63
Aug 13 14:27:22 mss121 kernel: [<c0404f17>] syscall_call+0x7/0xb
Aug 13 14:27:22 mss121 kernel: =======================
Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more
than 120 seconds.
Aug 13 14:27:22 mss121 kernel: "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 13 14:27:22 mss121 kernel: mpdsk D 00000BD7 1884 22161
22151 22162 22160 (NOTLB)
Aug 13 14:27:22 mss121 kernel: f34e2e04 00000082 baebd585
00000bd7 f34e2e50 c045d1a9 f34e2e50 0000000a
Aug 13 14:27:22 mss121 kernel: f6eb1550 baec5e00 00000bd7
0000887b 00000000 f6eb165c c8612700 f723f040
Aug 13 14:27:22 mss121 kernel: 00000000 00000000 00000000
c12e1f80 018dc68e c042cbd1 f6cb3bdc ffffffff
Aug 13 14:27:22 mss121 kernel: Call Trace:
Aug 13 14:27:22 mss121 kernel: [<c045d1a9>] __pagevec_release+0x15/0x1d
Aug 13 14:27:22 mss121 kernel: [<c042cbd1>] getnstimeofday+0x30/0xb6
Aug 13 14:27:22 mss121 kernel: [<c061c156>] io_schedule+0x36/0x59
Aug 13 14:27:22 mss121 kernel: [<c04569c0>] sync_page+0x38/0x3b
Aug 13 14:27:22 mss121 kernel: [<c061c32d>] __wait_on_bit+0x33/0x58
Aug 13 14:27:22 mss121 kernel: [<c0456988>] sync_page+0x0/0x3b
Aug 13 14:27:22 mss121 kernel: [<c0456a48>] wait_on_page_bit+0x5b/0x62
Aug 13 14:27:22 mss121 kernel: [<c043642c>] wake_bit_function+0x0/0x3c
Aug 13 14:27:22 mss121 kernel: [<c04573cf>]
wait_on_page_writeback_range+0x4d/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04934a0>] generic_osync_inode+0x93/0xbf
Aug 13 14:27:22 mss121 kernel: [<c0457618>]
sync_page_range_nolock+0x68/0x93
Aug 13 14:27:22 mss121 kernel: [<c0458930>]
generic_file_aio_write_nolock+0x71/0x83
Aug 13 14:27:22 mss121 kernel: [<c047b301>] blkdev_file_write+0x0/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0458c8d>]
generic_file_write_nolock+0x86/0x9a
Aug 13 14:27:22 mss121 kernel: [<c04566fe>] find_get_pages_tag+0x30/0x75
Aug 13 14:27:22 mss121 kernel: [<c0457428>]
wait_on_page_writeback_range+0xa6/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04363ff>]
autoremove_wake_function+0x0/0x2d
Aug 13 14:27:22 mss121 kernel: [<c061c408>] mutex_lock+0xb/0x19
Aug 13 14:27:22 mss121 kernel: [<c0449c52>] audit_syscall_entry+0x15a/0x18c
Aug 13 14:27:22 mss121 kernel: [<c047b31b>] blkdev_file_write+0x1a/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0474d53>] vfs_write+0xa1/0x143
Aug 13 14:27:22 mss121 kernel: [<c0475345>] sys_write+0x3c/0x63
Aug 13 14:27:22 mss121 kernel: [<c0404f17>] syscall_call+0x7/0xb
Aug 13 14:27:22 mss121 kernel: =======================
Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more
than 120 seconds.
Aug 13 14:29:22 mss121 kernel: "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 13 14:29:22 mss121 kernel: mpdsk D 00000BD7 1784 22158
22151 22159 22157 (NOTLB)
Aug 13 14:29:22 mss121 kernel: f3025e04 00000082 bde60e11
00000bd7 f3025e50 c045d1a9 f3025e50 0000000a
Aug 13 14:29:22 mss121 kernel: f7c60000 bde67e2a 00000bd7
00007019 00000000 f7c6010c c8612700 f6ec4e40
Aug 13 14:29:22 mss121 kernel: 00000000 00000000 00000000
c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff
Aug 13 14:29:22 mss121 kernel: Call Trace:
Aug 13 14:29:22 mss121 kernel: [<c045d1a9>] __pagevec_release+0x15/0x1d
Aug 13 14:29:22 mss121 kernel: [<c042cbd1>] getnstimeofday+0x30/0xb6
Aug 13 14:29:22 mss121 kernel: [<c061c156>] io_schedule+0x36/0x59
Aug 13 14:29:22 mss121 kernel: [<c04569c0>] sync_page+0x38/0x3b
Aug 13 14:29:22 mss121 kernel: [<c061c32d>] __wait_on_bit+0x33/0x58
Aug 13 14:29:22 mss121 kernel: [<c0456988>] sync_page+0x0/0x3b
Aug 13 14:29:22 mss121 kernel: [<c0456a48>] wait_on_page_bit+0x5b/0x62
Aug 13 14:29:22 mss121 kernel: [<c043642c>] wake_bit_function+0x0/0x3c
Aug 13 14:29:22 mss121 kernel: [<c04573cf>]
wait_on_page_writeback_range+0x4d/0xf1
Aug 13 14:29:22 mss121 kernel: [<c04934a0>] generic_osync_inode+0x93/0xbf
Aug 13 14:29:22 mss121 kernel: [<c0457618>]
sync_page_range_nolock+0x68/0x93
Aug 13 14:29:22 mss121 kernel: [<c0458930>]
generic_file_aio_write_nolock+0x71/0x83
Aug 13 14:29:22 mss121 kernel: [<c047b301>] blkdev_file_write+0x0/0x1e
Aug 13 14:29:22 mss121 kernel: [<c0458c8d>]
generic_file_write_nolock+0x86/0x9a
Aug 13 14:29:22 mss121 kernel: [<c04566fe>] find_get_pages_tag+0x30/0x75
Aug 13 14:29:22 mss121 kernel: [<c0457428>]
wait_on_page_writeback_range+0xa6/0xf1
Aug 13 14:29:22 mss121 kernel: [<c04363ff>]
autoremove_wake_function+0x0/0x2d
Aug 13 14:29:22 mss121 kernel: [<c061c408>] mutex_lock+0xb/0x19
Aug 13 14:29:22 mss121 kernel: [<c0449c52>] audit_syscall_entry+0x15a/0x18c
Aug 13 14:29:22 mss121 kernel: [<c047b31b>] blkdev_file_write+0x1a/0x1e
Aug 13 14:29:22 mss121 kernel: [<c0474d53>] vfs_write+0xa1/0x143
Aug 13 14:29:22 mss121 kernel: [<c0475345>] sys_write+0x3c/0x63
Aug 13 14:29:22 mss121 kernel: [<c0404f17>] syscall_call+0x7/0xb
Aug 13 14:29:22 mss121 kernel: =======================
Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more
than 120 seconds.
The mpdsk processes above are part of the Application which is a MUMPS database (not a RDB) that does the writing of data blocks to raw Logical Volume (no file system involved). It would have been doing writes during both pvmoves. I know pvmove is part ofLVM2, but because it worked with PowerPath and not when using Multipath and all other things are the same is the reason I am asking the questions here.
_____
Jack Allen
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel