pvmove hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Title: pvmove hangs

Hello:

        I posted this in the Dm-devel list yesterday afternoon, but so far I have not gotten any responses, so I thought I would ask the same questions here since the command that hang is pvmove.

        I had a customer that tried to do a pvmove and it hung. So we setup a test system to try and duplicate the problem and were able to.

        A little history and why I am asking the question in this list. The customer needed to move from an existing SAN to a new SAN and wanted as little as possible down time for the Application. So they zoned the new SAN for access by the system and then added the new LUNs to the existing Volume Group. Then ran the pvmove commands. It worked with no problem on one of the PVs, but on the second one all the I/O hung at the Application and any commands that access the LVM information such as vgdisplay.

        On our test system we only have 1 SAN (EMC CX700). We put X number of LUNs in a Volume Group and allocated Logical Volumes for the Application. Added some more LUNs to the Volume Group to simulate a second SAN. Started the Application with a test program to generate I/O.  Ran pvmove with no problems on one PV, but on the second PV, it hung just like on the customer's system.

        The reason I am posting to this list is because the same type of move was done earlier on the test system running PowerPath and did not have any problems. The OS is Red Hat EL 5.5 32 bit. The same version of LVM was used on both tests. I can provide other details if needed.

        Below is part of the messages file when this happen.

Aug 13 14:18:14 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:18:14 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:18:14 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-25: remove map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-17: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-20: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-23: add map (uevent)

Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more

than 120 seconds.

Aug 13 14:27:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:27:22 mss121 kernel: mpdsk         D 00000BD7  1784 22158

22151         22159 22157 (NOTLB)

Aug 13 14:27:22 mss121 kernel:        f3025e04 00000082 bde60e11

00000bd7 f3025e50 c045d1a9 f3025e50 0000000a

Aug 13 14:27:22 mss121 kernel:        f7c60000 bde67e2a 00000bd7

00007019 00000000 f7c6010c c8612700 f6ec4e40

Aug 13 14:27:22 mss121 kernel:        00000000 00000000 00000000

c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff

Aug 13 14:27:22 mss121 kernel: Call Trace:

Aug 13 14:27:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:27:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:27:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:27:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:27:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:27:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:27:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04934a0>] generic_osync_inode+0x93/0xbf

Aug 13 14:27:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:27:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:27:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:27:22 mss121 kernel:  [<c04566fe>] find_get_pages_tag+0x30/0x75

Aug 13 14:27:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:27:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:27:22 mss121 kernel:  [<c0449c52>] audit_syscall_entry+0x15a/0x18c

Aug 13 14:27:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:27:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:27:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:27:22 mss121 kernel:  =======================

Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more

than 120 seconds.

Aug 13 14:27:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:27:22 mss121 kernel: mpdsk         D 00000BD7  1884 22161

22151         22162 22160 (NOTLB)

Aug 13 14:27:22 mss121 kernel:        f34e2e04 00000082 baebd585

00000bd7 f34e2e50 c045d1a9 f34e2e50 0000000a

Aug 13 14:27:22 mss121 kernel:        f6eb1550 baec5e00 00000bd7

0000887b 00000000 f6eb165c c8612700 f723f040

Aug 13 14:27:22 mss121 kernel:        00000000 00000000 00000000

c12e1f80 018dc68e c042cbd1 f6cb3bdc ffffffff

Aug 13 14:27:22 mss121 kernel: Call Trace:

Aug 13 14:27:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:27:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:27:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:27:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:27:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:27:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:27:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04934a0>] generic_osync_inode+0x93/0xbf

Aug 13 14:27:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:27:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:27:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:27:22 mss121 kernel:  [<c04566fe>] find_get_pages_tag+0x30/0x75

Aug 13 14:27:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:27:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:27:22 mss121 kernel:  [<c0449c52>] audit_syscall_entry+0x15a/0x18c

Aug 13 14:27:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:27:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:27:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:27:22 mss121 kernel:  =======================

Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more

than 120 seconds.

Aug 13 14:29:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:29:22 mss121 kernel: mpdsk         D 00000BD7  1784 22158

22151         22159 22157 (NOTLB)

Aug 13 14:29:22 mss121 kernel:        f3025e04 00000082 bde60e11

00000bd7 f3025e50 c045d1a9 f3025e50 0000000a

Aug 13 14:29:22 mss121 kernel:        f7c60000 bde67e2a 00000bd7

00007019 00000000 f7c6010c c8612700 f6ec4e40

Aug 13 14:29:22 mss121 kernel:        00000000 00000000 00000000

c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff

Aug 13 14:29:22 mss121 kernel: Call Trace:

Aug 13 14:29:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:29:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:29:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:29:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:29:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:29:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:29:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:29:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:29:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:29:22 mss121 kernel:  [<c04934a0>] generic_osync_inode+0x93/0xbf

Aug 13 14:29:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:29:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:29:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:29:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:29:22 mss121 kernel:  [<c04566fe>] find_get_pages_tag+0x30/0x75

Aug 13 14:29:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:29:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:29:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:29:22 mss121 kernel:  [<c0449c52>] audit_syscall_entry+0x15a/0x18c

Aug 13 14:29:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:29:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:29:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:29:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:29:22 mss121 kernel:  =======================

Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more

than 120 seconds.

        The mpdsk processes above are part of the Application which is a MUMPS database (not a RDB) that does the writing of data blocks to raw Logical Volume (no file system involved). It would have been doing writes during both pvmoves. I know pvmove is part ofLVM2, but because it worked with PowerPath and not when using Multipath and all other things are the same is the reason I am asking the questions here.

_____

Jack Allen

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux