Re: Block device throttling [Re: Distributed storage.]

Daniel Phillips <phillips@xxxxxxxxx> · Mon, 13 Aug 2007 06:04:06 -0700

On Monday 13 August 2007 05:18, Evgeniy Polyakov wrote:
> > Say you have a device mapper device with some physical device
> > sitting underneath, the classic use case for this throttle code. 
> > Say 8,000 threads each submit an IO in parallel.  The device mapper
> > mapping function will be called 8,000 times with associated
> > resource allocations, regardless of any throttling on the physical
> > device queue.
>
> Each thread will sleep in generic_make_request(), if limit is
> specified correctly, then allocated number of bios will be enough to
> have a progress.

The problem is, the sleep does not occur before the virtual device 
mapping function is called.  Let's consider two devices, a physical 
device named pdev and a virtual device sitting on top of it called 
vdev.   vdev's throttle limit is just one element, but we will see that 
in spite of this, two bios can be handled by the vdev's mapping method 
before any IO completes, which violates the throttling rules. According 
to your patch it works like this:

                     Thread 1                                Thread  2

	<no wait because vdev->bio_queued is zero>

	vdev->q->bio_queued++

	<enter devmapper map method>

	blk_set_bdev(bio, pdev)
	     vdev->bio_queued--

					<no wait because vdev->bio_queued is zero>

					vdev->q->bio_queued++

					<enter devmapper map method>

					whoops!  Our virtual device mapping
					function has now allocated resources
					for two in-flight bios in spite of having its
					throttle limit set to 1.

Perhaps you never worried about the resources that the device mapper 
mapping function allocates to handle each bio and so did not consider 
this hole significant.  These resources can be significant, as is the 
case with ddsnap.  It is essential to close that window through with 
the virtual device's queue limit may be violated.  Not doing so will 
allow deadlock.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html