On Tue, Sep 08 2009 at 10:17am -0400, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > Hi > > > The DM snapshot-merge patches are here: > > http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.31/ > > > > The LVM2 snapshot-merge patches are here: > > http://people.redhat.com/msnitzer/patches/snapshot-merge/lvm2/LVM2-2.02.52-cvs/ > > > > I threw some real work at snapshot-merge by taking a snap _before_ > > formatting a 100G LV with ext3. Then merged all the exceptions. One > > observation is that the merge process is _quite_ slow in comparison to > > how long it took to format the LV (with associated snapshot exception > > copy-out). Will need to look at this further shortly... it's likely a > > function of using minimal system resources during the merge via kcopyd; > > whereas the ext3 format puts excessive pressure on the system's page > > cache to queue work for mkfs's immediate writeback. > > I thought about this, see the comment: > /* TODO: use larger I/O size once we verify that kcopyd handles it */ > > There was some bug that kcopyd didn't handle larget I/O but it is already > fixed, so it is possible to extend it. > > s->store->type->prepare_merge returns the number of chunks that can be > linearly copied starting from the returned chunk numbers backward. (but > the caller is allowed to copy less, and the caller puts the number of > copied chunks to s->store->type->commit_merge) > > I.e. if returned chunk numbers are old_chunk == 10 and new_chunk == 20 and > returned value is 3, then chunk 20 can be copied to 10, chunk 19 to 9 and > 18 to 8. > > There is a variable, s->merge_write_interlock_n, that is now always one, > but can hold larger number --- the number of chunks that is being copied. > > So it can be trivialy extended to copy more chunks at once. > > > On the other hand, if the snapshot doesn't contain consecutive chunks (it > was created as a result of random writes, not as a result of one big > write), larger I/O can't be done and its merging will be slow by design. > It could be improved by spawning several concurrent kcopyd jobs, but I > wouldn't do it because it would complicate code too much and it would > damage interactive performance. (in a desktop or server environment, the > user typically cares more about interactive latency than about copy > throughput). Mikulas, I ran with your suggestion and it works quite well (results below). It proved a bit more involved to get working because I needed to: 1) Fix the fact that I was _always_ seeing a return from persistent_prepare_merge() that was equal to ps->current_committed 2) Fix areas that needed to properly use s->merge_write_interlock_n - assumption of only 1 chunk per dm_kcopyd_copy() was somewhat wide-spread The changes are detailed at the end of this mail. I still have a "TODO" that you'll see in the patch that speaks to needing to have snapshot_merge_process() check all origin chunks via __chunk_is_tracked() rather than just the one check of the first chunk. Your feedback here would be much appreciated. I'm not completely clear on the intent of the __chunk_is_tracked() check there and what it now needs to do. Here are performance results from some mkfs-based testing: # lvcreate -n testlv -L 32G test # lvcreate -n testlv_snap -s -L 7G test/testlv # time mkfs.ext3 /dev/test/testlv ... real 1m7.827s user 0m0.116s sys 0m11.017s # lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert testlv test owi-a- 32.00G testlv_snap test swi-a- 7.00G testlv 9.05 before: ------- # time lvconvert -M test/testlv_snap Merging of volume testlv_snap started. ... Merge into logical volume testlv finished. Logical volume "snapshot1" successfully removed real 22m33.100s user 0m0.045s sys 0m0.711s after: ------ # time lvconvert -M test/testlv_snap Merging of volume testlv_snap started. testlv: Merged: 6.4% testlv: Merged: 3.5% testlv: Merged: 0.9% testlv: Merged: 0.0% Merge into logical volume testlv finished. Logical volume "snapshot1" successfully removed real 1m0.881s [NOTE: we could be ~15 sec faster than reported] user 0m0.015s sys 0m0.560s So we're now seeing _very_ respectible snapshot-merge performance; but please review the following patch in case I'm somehow cheating, thanks! Mike --- drivers/md/dm-snap-persistent.c | 6 ++-- drivers/md/dm-snap.c | 71 +++++++++++++++++++++++--------------- 2 files changed, 46 insertions(+), 31 deletions(-) diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c index 23c3a48..27405a0 100644 --- a/drivers/md/dm-snap-persistent.c +++ b/drivers/md/dm-snap-persistent.c @@ -723,9 +723,9 @@ static int persistent_prepare_merge(struct dm_exception_store *store, for (i = 1; i < ps->current_committed; i++) { read_exception(ps, ps->current_committed - 1 - i, &de); - if (de.old_chunk == *old_chunk - i && - de.new_chunk == *new_chunk - i) - continue; + if (de.old_chunk != *old_chunk - i || + de.new_chunk != *new_chunk - i) + break; } return i; diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index c7252b2..98bcbb6 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -653,12 +653,13 @@ static void merge_callback(int read_err, unsigned long write_err, static void snapshot_merge_process(struct dm_snapshot *s) { - int r; + int r, linear_chunks; chunk_t old_chunk, new_chunk; struct origin *o; chunk_t min_chunksize; int must_wait; struct dm_io_region src, dest; + sector_t io_size; BUG_ON(!s->merge_running); if (s->merge_shutdown) @@ -674,34 +675,37 @@ static void snapshot_merge_process(struct dm_snapshot *s) DMERR("target store does not support merging"); goto shut; } - r = s->store->type->prepare_merge(s->store, &old_chunk, &new_chunk); - if (r <= 0) { - if (r < 0) + linear_chunks = s->store->type->prepare_merge(s->store, + &old_chunk, &new_chunk); + if (linear_chunks <= 0) { + if (linear_chunks < 0) DMERR("Read error in exception store, " "shutting down merge"); goto shut; } - /* TODO: use larger I/O size once we verify that kcopyd handles it */ - + /* + * Use one (potentially large) I/O to copy all 'linear_chunks' + * from the exception store to the origin + */ + io_size = linear_chunks * s->store->chunk_size; dest.bdev = s->origin->bdev; - dest.sector = chunk_to_sector(s->store, old_chunk); - dest.count = min((sector_t)s->store->chunk_size, - get_dev_size(dest.bdev) - dest.sector); + dest.sector = chunk_to_sector(s->store, old_chunk + 1 - linear_chunks); + dest.count = min(io_size, get_dev_size(dest.bdev) - dest.sector); src.bdev = s->cow->bdev; - src.sector = chunk_to_sector(s->store, new_chunk); + src.sector = chunk_to_sector(s->store, new_chunk + 1 - linear_chunks); src.count = dest.count; test_again: - /* Reallocate other snapshots */ + /* Reallocate other snapshots; must account for all 'linear_chunks' */ down_read(&_origins_lock); o = __lookup_origin(s->origin->bdev); must_wait = 0; min_chunksize = minimum_chunk_size(o); if (min_chunksize) { chunk_t n; - for (n = 0; n < s->store->chunk_size; n += min_chunksize) { + for (n = 0; n < io_size; n += min_chunksize) { r = __origin_write(&o->snapshots, dest.sector + n, NULL); if (r == DM_MAPIO_SUBMITTED) @@ -715,10 +719,12 @@ test_again: } down_write(&s->lock); - s->merge_write_interlock = old_chunk; - s->merge_write_interlock_n = 1; + s->merge_write_interlock = old_chunk + 1 - linear_chunks; + s->merge_write_interlock_n = linear_chunks; up_write(&s->lock); + /* TODO: rather than only checking if the first chunk has a conflicting + read; do all 'linear_chunks' need to be checked? */ while (__chunk_is_tracked(s, old_chunk)) msleep(1); @@ -745,7 +751,7 @@ static inline void release_write_interlock(struct dm_snapshot *s, int err) static void merge_callback(int read_err, unsigned long write_err, void *context) { - int r; + int r, i; struct dm_snapshot *s = context; struct dm_snap_exception *e; @@ -757,25 +763,34 @@ static void merge_callback(int read_err, unsigned long write_err, void *context) goto shut; } - r = s->store->type->commit_merge(s->store, 1); + r = s->store->type->commit_merge(s->store, s->merge_write_interlock_n); if (r < 0) { DMERR("Write error in exception store, shutting down merge"); goto shut; } down_write(&s->lock); - e = lookup_exception(&s->complete, s->merge_write_interlock); - if (!e) { - DMERR("exception for block %llu is on disk but not in memory", - (unsigned long long)s->merge_write_interlock); - up_write(&s->lock); - goto shut; - } - if (dm_consecutive_chunk_count(e)) { - dm_consecutive_chunk_count_dec(e); - } else { - remove_exception(e); - free_exception(e); + /* + * Must process chunks (and associated exceptions) in reverse + * so that dm_consecutive_chunk_count_dec() accounting works + */ + for (i = s->merge_write_interlock_n - 1; i >= 0; i--) { + e = lookup_exception(&s->complete, + s->merge_write_interlock + i); + if (!e) { + DMERR("exception for block %llu is on " + "disk but not in memory", + (unsigned long long) + s->merge_write_interlock + i); + up_write(&s->lock); + goto shut; + } + if (dm_consecutive_chunk_count(e)) { + dm_consecutive_chunk_count_dec(e); + } else { + remove_exception(e); + free_exception(e); + } } release_write_interlock(s, 0); -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel