Re: [PATCH, RFC] scsi: use host wide tags by default

Jens Axboe <axboe@xxxxxxxxx> · Fri, 17 Apr 2015 16:40:07 -0600




On 04/17/2015 04:20 PM, James Bottomley wrote:
On Fri, 2015-04-17 at 16:07 -0600, Jens Axboe wrote:
On 04/17/2015 03:57 PM, James Bottomley wrote:
On Fri, 2015-04-17 at 15:47 -0600, Jens Axboe wrote:
On 04/17/2015 03:46 PM, James Bottomley wrote:
On Fri, 2015-04-17 at 15:44 -0600, Jens Axboe wrote:
On 04/17/2015 03:42 PM, James Bottomley wrote:
@@ -662,32 +662,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
      */
     int scsi_change_queue_depth(struct scsi_device *sdev, int depth)
     {
-	unsigned long flags;
-
-	if (depth <= 0)
-		goto out;
-
-	spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
+	if (depth > 0) {
+		unsigned long flags;

-	/*
-	 * Check to see if the queue is managed by the block layer.
-	 * If it is, and we fail to adjust the depth, exit.
-	 *
-	 * Do not resize the tag map if it is a host wide share bqt,
-	 * because the size should be the hosts's can_queue. If there
-	 * is more IO than the LLD's can_queue (so there are not enuogh
-	 * tags) request_fn's host queue ready check will handle it.
-	 */
-	if (!shost_use_blk_mq(sdev->host) && !sdev->host->bqt) {
-		if (blk_queue_tagged(sdev->request_queue) &&
-		    blk_queue_resize_tags(sdev->request_queue, depth) != 0)
-			goto out_unlock;
+		spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
+		sdev->queue_depth = depth;
+		spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);

This lock/unlock is a nasty global sync point which can be eliminated:
we can rely on the architectural atomicity of 32 bit writes (might need
to make sdev->queue_depth a u32 because I seem to remember 16 bit writes
had to be done as two byte stores on some architectures).

It's not in a hot path (by any stretch), so doesn't really matter...

Sure, but it's good practise not to do this, otherwise the pattern
lock/u32 store/unlock gets duplicated into hot paths by people who are
confused about whether locking is required.

It's a lot saner default to lock/unlock and have people copy that, than
have them misguidedly think that no locking is require for whatever
reason.

Moving to lockless coding is important for the small packet performance
we're all chasing.  I'd rather train people to think about the problem
than blindly introduce unnecessary locking and then have someone else
remove it in the name of performance improvement.  If they get it wrong
the other way (no locking where it was needed) our code review process
should spot that.

We're chasing cycles for the hot path, not for the init path. I'd much
rather keep it simple where we can, and keep the much harder problems
for the cases that really matter. Locking and ordering is _hard_, most
people get it wrong, most of the time. And spotting missing locking at
review time is a much harder problem. I would generally recommend people
get it right _first_, then later work on optimizing the crap out of it.
That's much easier to do with a stable base anyway.

OK, so I think we can agree to differ.  You're saying care only where it
matters because that's where you should concentrate and I'm saying care
everywhere because that disciplines you to be correct where it matters.

I'm saying you should only do it where it matters, because odds are you 
are going to get it wrong. And if you get it wrong where it matters, 
we'll eventually find out, because things wont work. If you get it wrong 
in other places, that bug can linger forever. Or only hit exotic 
setups/architectures, making it a much harder problem.

I'm all for having nice design patterns that force people into the right 
mentality, but there's a line in the sand where that stops making sense.

In this case, it is a problem because in theory the language ('C') makes
no such atomicity guarantees (which is why most people think you need a
lock here).  The atomicity guarantees are extrapolated from the platform
it's running on.

   The write itself might be atomic, but you still need to
guarantee visibility.

The function barrier guarantees mean it's visible by the time the
function returns.  However, I wouldn't object to a wmb here if you think
it's necessary ... it certainly serves as a marker for "something clever
is going on".

The sequence point means it's not reordered across it, it does not give
you any guarantees on visibility. And we're getting into semantics of C
here, but I believe or that even to be valid, you'd need to make
->queue_depth volatile. And honestly, I'd hate to rely on that. Which
means you need proper barriers.

Actually, no, not at all.  Volatile is a compiler optimisation
primitive.  It means the compiler may not keep any assignment to this
location internally.  Visibility of stores depends on two types of
barrier:  One is influenced by the ability of the compiler to reorder
operations, which it may up to a barrier.  The other is the ability of
the architecture to reorder the execution pipelines, and so execute out
of order the instructions the compiler created, which it may up to a
barrier sync instruction.  wmb is a heavyweight barrier instruction that
would make sure all stores before this become visibile to everything in
the system.  In this case it's not necessary because a function return
is also a compile and execution barrier, so as long as we don't care
about visibility until the scsi_change_queue_depth() function returns
(which I think we don't), then no explicit barrier is required (and
certainly no volatile on the stored location).

There's a good treatise on this in Documentation/memory-barriers.txt but
I do find it over didactic for the simple issues.

wmb() (or smp_wmb()) is a store ordering barrier, it'll do nothing for 
visibility. So if we want to order multiple stores against each other, 
then that'd be appropriate. You'd need a read memory barrier to order 
the load against the store. Adding that before reading ->queue_depth 
would be horrible. So then you'd need to do a full barrier, at which 
point you may as well keep the lock, if your point is about doing the 
most optimal code so that people will be forced to do that everywhere.

So your claim is that a function call (or sequence point) is a full 
memory barrier. That is not correct, or I missed that in the C spec. If 
that's the case, what if the function is inlined?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html