Re: [PATCH 2/2] dm-snapshot: Reimplement the cow limit.

Mike Snitzer <snitzer@xxxxxxxxxx> · Thu, 3 Oct 2019 22:44:41 -0400

On Thu, Oct 03 2019 at  4:06P -0400,
Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

> 
> 
> On Wed, 2 Oct 2019, Nikos Tsironis wrote:
> 
> > Hi Mikulas,
> > 
> > I agree that it's better to avoid holding any locks while waiting for
> > some pending kcopyd jobs to finish, but please see the comments below.
> > 
> > On 10/2/19 1:15 PM, Mikulas Patocka wrote:
> > > +
> > > +static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
> > > +{
> > > +	if (unlikely(s->in_progress > cow_threshold)) {
> > > +		spin_lock(&s->in_progress_wait.lock);
> > > +		if (likely(s->in_progress > cow_threshold)) {
> > > +			DECLARE_WAITQUEUE(wait, current);
> > > +			__add_wait_queue(&s->in_progress_wait, &wait);
> > > +			__set_current_state(TASK_UNINTERRUPTIBLE);
> > > +			spin_unlock(&s->in_progress_wait.lock);
> > > +			if (unlock_origins)
> > > +				up_read(&_origins_lock);
> > > +			io_schedule();
> > > +			remove_wait_queue(&s->in_progress_wait, &wait);
> > > +			return false;
> > > +		}
> > > +		spin_unlock(&s->in_progress_wait.lock);
> > > +	}
> > > +	return true;
> > >  }
> > 
> > wait_for_in_progress() doesn't take into account which chunk is written
> > and whether it has already been reallocated or it is currently
> > reallocating.
> > 
> > This means, if I am not missing something, that both origin_map() and
> > snapshot_map() might unnecessarily throttle writes that don't need any
> > COW to take place.
> 
> I know about it, but I think it's not serious problem - if there are 2048 
> outstanding I/Os the system is already heavily congested. It doesn't 
> matter if you allow a few more writes or not.
> 
> Mikulas
> 
> > For example, if we have some writes coming in, that trigger COW and
> > cause the COW limit to be reached, and then some more writes come in for
> > chunks that have already been reallocated (and before any kcopyd job
> > finishes and decrements s->in_progress), the latter writes would be
> > delayed without a reason, as they don't require any COW to be performed.
> > 
> > It seems strange that the COW throttling mechanism might also throttle
> > writes that don't require any COW.
> > 
> > Moreover, if we have reached the COW limit and we get a write for a
> > chunk that is currently reallocating we will block the thread, when we
> > could just add the bio to the origin_bios list of the pending exception
> > and move on.
> > 
> > wait_for_in_progress() could check if the exception is already
> > reallocated or is being reallocated, but the extra locking in the
> > critical path might have an adverse effect on performance, especially in
> > multi-threaded workloads. Maybe some benchmarks would help clarify that.
> > 
> > As a final note, in case the devices are slow, there might be many
> > writers waiting in s->in_progress_wait. When a kcopyd job finishes, all
> > of them will wake up and in some cases we might end up issuing more COW
> > jobs than the cow_count limit, as the accounting for new COW jobs
> > doesn't happen atomically with the check for the cow_count limit in
> > wait_for_in_progress().
> > 
> > That said, I don't think having a few more COW jobs in flight, than the
> > defined limit, will cause any issues.

Nikos,

I looked at your concern even before Mikulas replied and found it to be
valid.  But in the end I struggled to imagine how imposing extra
throttling once above the thorttle threshold would significantly impact
performance.

So when I saw Mikulas' reply he definitely reinforced my thinking.  But
please feel free to explore whether further refinement is needed.  I
think your concern about extra locking in the hotpath (to check if a
chunk already triggered an exception) was a great observation but if
such a check is done I'm hopeful it won't be _that_ costly because we'll
have already reached the cow threshold and would already be taking the
lock (as the 2nd phase of the double checked locking).

Anyway, I folded these small tweaks into Mikulas' 2nd patch:

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 560b8cb38026..4fb1a40e68a0 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1525,7 +1525,8 @@ static void account_end_copy(struct dm_snapshot *s)
 	spin_lock(&s->in_progress_wait.lock);
 	BUG_ON(!s->in_progress);
 	s->in_progress--;
-	if (likely(s->in_progress <= cow_threshold) && unlikely(waitqueue_active(&s->in_progress_wait)))
+	if (likely(s->in_progress <= cow_threshold) &&
+	    unlikely(waitqueue_active(&s->in_progress_wait)))
 		wake_up_locked(&s->in_progress_wait);
 	spin_unlock(&s->in_progress_wait.lock);
 }
@@ -1535,6 +1536,13 @@ static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
 	if (unlikely(s->in_progress > cow_threshold)) {
 		spin_lock(&s->in_progress_wait.lock);
 		if (likely(s->in_progress > cow_threshold)) {
+			/*
+			 * NOTE: this throttle doesn't account for whether
+			 * the caller is servicing an IO that will trigger a COW
+			 * so excess throttling may result for chunks not required
+			 * to be COW'd.  But if cow_threshold was reached, extra
+			 * throttling is unlikely to negatively impact performance.
+			 */
 			DECLARE_WAITQUEUE(wait, current);
 			__add_wait_queue(&s->in_progress_wait, &wait);
 			__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -1955,7 +1963,8 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_KILL;
 
 	if (bio_data_dir(bio) == WRITE) {
-		while (unlikely(!wait_for_in_progress(s, false))) ;
+		while (unlikely(!wait_for_in_progress(s, false)))
+			; /* wait_for_in_progress() has slept */
 	}
 
 	down_read(&s->lock);
@@ -2538,8 +2547,8 @@ static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit)
 	down_read(&_origins_lock);
 	o = __lookup_origin(origin->bdev);
 	if (o) {
-		struct dm_snapshot *s;
 		if (limit) {
+			struct dm_snapshot *s;
 			list_for_each_entry(s, &o->snapshots, list)
 				if (unlikely(!wait_for_in_progress(s, true)))
 					goto again;

and I've pushed the changes to linux-next via linux-dm.git's for-next
(with tweaked commit headers).  But if you or Mikulas find something
that would warrant destaging these changes I'd welcome that feedback.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel