On Wed, Nov 11 2009 at 8:20am -0500, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > On Tue, Nov 10 2009 at 8:16pm -0500, > Alasdair G Kergon <agk@xxxxxxxxxx> wrote: > > > Questions: > > > > Do all the targets correctly flush or push back everything during a > > suspend (including workqueues)? > > > > Do all the targets correctly sync to disk all internal state that > > needs to be preserved during a suspend? > > > > In other words, in the case of an already-suspended target, the target > > 'dtr' functions should only be freeing memory and other resources and > > not causing I/O to any of the table's devices. > > > > All targets are supposed to be behave this way already, but please > > would you check the targets with which you are familiar anyway? > > > > Alasdair > > > > > > From: Alasdair G Kergon <agk@xxxxxxxxxx> > > > > When replacing a mapped device's table during a 'resume', delay the > > destruction of the old table until the new one is successfully in place. > > > > This will make it easier for a later patch to transfer internal state > > information from the old table to the new one (something we do not currently > > support) while giving us more options for reversion if a later part > > of the operation fails. > > I have confirmed that this patch allows handover to work within a single > device. Alasdair, After further testing I've hit a lockdep trace. My testing was with handing over on the same device. I had the snapshot (of an ext3 FS) mounted and I was doing a sequential direct-io write to a file in the FS. While writing I triggered a handover with the following: echo "0 50331648 snapshot 253:2 253:3 P 8" | dmsetup reload test-testlv_snap dmsetup resume test-testlv_snap With that handover worked fine (with no IO errors), but the following lockdep resulted (some "snapshot_*" tracing was added for context): snapshot_ctr snapshot_ctr: found snap_src snapshot_presuspend ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6-snitm #8 ------------------------------------------------------- dmsetup/1827 is trying to acquire lock: (&md->suspend_lock){+.+...}, at: [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] but task is already holding lock: (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&journal->j_barrier){+.+...}: [<ffffffff810857b3>] __lock_acquire+0xb6b/0xd13 [<ffffffff81086396>] lock_release_non_nested+0x1dc/0x23b [<ffffffff8108656f>] lock_release+0x17a/0x1a5 [<ffffffff8139214b>] __mutex_unlock_slowpath+0xce/0x132 [<ffffffff813921bd>] mutex_unlock+0xe/0x10 [<ffffffff81147329>] freeze_bdev+0x104/0x110 [<ffffffffa0069038>] dm_suspend+0x119/0x2a1 [dm_mod] [<ffffffffa006db3a>] dev_suspend+0x11d/0x1de [dm_mod] [<ffffffffa006e30c>] ctl_ioctl+0x1c6/0x213 [dm_mod] [<ffffffffa006e36c>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff8112a959>] vfs_ioctl+0x22/0x87 [<ffffffff8112aec2>] do_vfs_ioctl+0x488/0x4ce [<ffffffff8112af5e>] sys_ioctl+0x56/0x79 [<ffffffff8100bb82>] system_call_fastpath+0x16/0x1b -> #0 (&md->suspend_lock){+.+...}: [<ffffffff8108565d>] __lock_acquire+0xa15/0xd13 [<ffffffff81085a37>] lock_acquire+0xdc/0x102 [<ffffffff81392372>] __mutex_lock_common+0x4b/0x37b [<ffffffff81392766>] mutex_lock_nested+0x3e/0x43 [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffffa006db45>] dev_suspend+0x128/0x1de [dm_mod] [<ffffffffa006e30c>] ctl_ioctl+0x1c6/0x213 [dm_mod] [<ffffffffa006e36c>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff8112a959>] vfs_ioctl+0x22/0x87 [<ffffffff8112aec2>] do_vfs_ioctl+0x488/0x4ce [<ffffffff8112af5e>] sys_ioctl+0x56/0x79 [<ffffffff8100bb82>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by dmsetup/1827: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 stack backtrace: Pid: 1827, comm: dmsetup Not tainted 2.6.32-rc6-snitm #8 Call Trace: [<ffffffff81084825>] print_circular_bug+0xa8/0xb7 [<ffffffff8108565d>] __lock_acquire+0xa15/0xd13 [<ffffffff81085a37>] lock_acquire+0xdc/0x102 [<ffffffffa00678d8>] ? dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffffa00678d8>] ? dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffffa006da1d>] ? dev_suspend+0x0/0x1de [dm_mod] [<ffffffff81392372>] __mutex_lock_common+0x4b/0x37b [<ffffffffa00678d8>] ? dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffff81083933>] ? mark_lock+0x2d/0x22d [<ffffffff81083b85>] ? mark_held_locks+0x52/0x70 [<ffffffff8139219d>] ? __mutex_unlock_slowpath+0x120/0x132 [<ffffffffa006da1d>] ? dev_suspend+0x0/0x1de [dm_mod] [<ffffffff81392766>] mutex_lock_nested+0x3e/0x43 [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffff813921bd>] ? mutex_unlock+0xe/0x10 [<ffffffffa00691ae>] ? dm_suspend+0x28f/0x2a1 [dm_mod] [<ffffffffa006da1d>] ? dev_suspend+0x0/0x1de [dm_mod] [<ffffffffa006db45>] dev_suspend+0x128/0x1de [dm_mod] [<ffffffffa006e30c>] ctl_ioctl+0x1c6/0x213 [dm_mod] [<ffffffff81077d7f>] ? cpu_clock+0x43/0x5e [<ffffffffa006e36c>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff8112a959>] vfs_ioctl+0x22/0x87 [<ffffffff81083e41>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8112aec2>] do_vfs_ioctl+0x488/0x4ce [<ffffffff811f3e5a>] ? __up_read+0x76/0x7f [<ffffffff81076746>] ? up_read+0x2b/0x2f [<ffffffff8100c635>] ? retint_swapgs+0x13/0x1b [<ffffffff8112af5e>] sys_ioctl+0x56/0x79 [<ffffffff8100bb82>] system_call_fastpath+0x16/0x1b snapshot_preresume snapshot_preresume: snap_src is_handover_source snapshot_preresume: resuming handover-destination snapshot_resume snapshot_resume: handing over exceptions snapshot_dtr -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel