Re: [PATCH V3 15/25] smartpqi: fix host qdepth limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.01.21 15:12, Martin Wilck wrote:
On Tue, 2021-01-19 at 10:33 +0000, John Garry wrote:

Am 10.12.20 um 21:35 schrieb Don Brace:
From: Mahesh Rajashekhara <mahesh.rajashekhara@xxxxxxxxxxxxx>

* Correct scsi-mid-layer sending more requests than
     exposed host Q depth causing firmware ASSERT issue.
     * Add host Qdepth counter.

This supposedly fixes the regression between Linux 5.4 and 5.9,
which
we reported in [1].

       kernel: smartpqi 0000:89:00.0: controller is offline:
status code
0x6100c
       kernel: smartpqi 0000:89:00.0: controller offline

Thank you for looking into this issue and fixing it. We are going
to
test this.

For easily finding these things in the git history or the WWW, it
would be great if these log messages could be included (in the
future).
DON> Thanks for your suggestion. Well add them in the next time.

Also, that means, that the regression is still present in Linux
5.10,
released yesterday, and this commit does not apply to these
versions.

DON> They have started 5.10-RC7 now. So possibly 5.11 or 5.12
depending when all of the patches are applied. The patch in
question
is among 28 other patches.

Mahesh, do you have any idea, what commit caused the regression
and
why the issue started to show up?
DON> The smartpqi driver sets two scsi_host_template member
fields:
.can_queue and .nr_hw_queues. But we have not yet converted to
host_tagset. So the queue_depth becomes nr_hw_queues * can_queue,
which is more than the hw can support. That can be verified by
looking
at scsi_host.h.
          /*
           * In scsi-mq mode, the number of hardware queues
supported by
the LLD.
           *
           * Note: it is assumed that each hardware queue has a
queue
depth of
           * can_queue. In other words, the total queue depth per
host
           * is nr_hw_queues * can_queue. However, for when
host_tagset
is set,
           * the total queue depth is can_queue.
           */

So, until we make this change, the queue_depth change prevents
the
above issue from happening.

can_queue and nr_hw_queues have been set like this as long as the
driver existed. Why did Paul observe a regression with 5.9?

And why can't you simply set can_queue to (ctrl_info-
scsi_ml_can_queue / nr_hw_queues)?

Don: I did this in an internal patch, but this patch seemed to work
the best for our driver. HBA performance remained steady when
running benchmarks.

That was a stupid suggestion on my part. Sorry.

I guess that this is a fallout from commit 6eb045e092ef ("scsi:
   core: avoid host-wide host_busy counter for scsi_mq"). But that
commit
is correct.

It would be good if someone (Paul?) could verify whether that commit
actually caused the regression they saw.

We can reliably trigger the issue with a certain load pattern on a certain hardware.

I've compiled 6eb045e092ef  and got (as with other affected kernels) "controller is offline: status code 0x6100c" after 15 minutes of the test load.
I've compiled 6eb045e092ef^ and the load is running for 3 1/2 hours now.

So you hit it.

Looking at that 6eb045e092ef, I notice this hunk:

- busy = atomic_inc_return(&shost->host_busy) - 1;
         if (atomic_read(&shost->host_blocked) > 0) {
-               if (busy)
+               if (scsi_host_busy(shost) > 0)
                         goto starved;

Before 6eb045e092ef, the busy count was incremented with membarrier
before looking at "host_blocked". The new code does this instead:

@ -1403,6 +1400,8 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
                 spin_unlock_irq(shost->host_lock);
         }
+ __set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
+

but it happens *after* the "host_blocked" check. Could that perhaps
have caused the regression?

I'm not into this and can't comment on that. But if you need me to test any patch for verification, I'll certainly can do that.

Best
  Donald



Thanks
Martin




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux