-----Original Message----- From: Donald Buczek [mailto:buczek@xxxxxxxxxxxxx] Subject: Re: [PATCH V3 15/25] smartpqi: fix host qdepth limit > > It would be good if someone (Paul?) could verify whether that commit > actually caused the regression they saw. We can reliably trigger the issue with a certain load pattern on a certain hardware. I've compiled 6eb045e092ef and got (as with other affected kernels) "controller is offline: status code 0x6100c" after 15 minutes of the test load. I've compiled 6eb045e092ef^ and the load is running for 3 1/2 hours now. So you hit it. Don: good news, I was starting my own testing. Thanks for your help > Looking at that 6eb045e092ef, I notice this hunk: > > > - busy = atomic_inc_return(&shost->host_busy) - 1; > if (atomic_read(&shost->host_blocked) > 0) { > - if (busy) > + if (scsi_host_busy(shost) > 0) > goto starved; > > Before 6eb045e092ef, the busy count was incremented with membarrier > before looking at "host_blocked". The new code does this instead: > > @ -1403,6 +1400,8 @@ static inline int scsi_host_queue_ready(struct request_queue *q, > spin_unlock_irq(shost->host_lock); > } > > + __set_bit(SCMD_STATE_INFLIGHT, &cmd->state); > + > > but it happens *after* the "host_blocked" check. Could that perhaps > have caused the regression? I'm not into this and can't comment on that. But if you need me to test any patch for verification, I'll certainly can do that. Best Donald > > > Thanks > Martin >