On Tue, 2020-12-15 at 20:23 +0000, Don.Brace@xxxxxxxxxxxxx wrote: > Please see answers below. Hope this helps. > > -----Original Message----- > From: Paul Menzel [mailto:pmenzel@xxxxxxxxxxxxx] > Sent: Monday, December 14, 2020 11:54 AM > To: Don Brace - C33706 <Don.Brace@xxxxxxxxxxxxx>; Kevin Barnett - > C33748 <Kevin.Barnett@xxxxxxxxxxxxx>; Scott Teel - C33730 < > Scott.Teel@xxxxxxxxxxxxx>; Justin Lindley - C33718 < > Justin.Lindley@xxxxxxxxxxxxx>; Scott Benesh - C33703 < > Scott.Benesh@xxxxxxxxxxxxx>; Gerry Morong - C33720 < > Gerry.Morong@xxxxxxxxxxxxx>; Mahesh Rajashekhara - I30583 < > Mahesh.Rajashekhara@xxxxxxxxxxxxx>; hch@xxxxxxxxxxxxx; > joseph.szczypek@xxxxxxx; POSWALD@xxxxxxxx; James E. J. Bottomley < > jejb@xxxxxxxxxxxxx>; Martin K. Petersen <martin.petersen@xxxxxxxxxx> > Cc: linux-scsi@xxxxxxxxxxxxxxx; it+linux-scsi@xxxxxxxxxxxxx; Donald > Buczek <buczek@xxxxxxxxxxxxx>; Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> > Subject: Re: [PATCH V3 15/25] smartpqi: fix host qdepth limit > > EXTERNAL EMAIL: Do not click links or open attachments unless you > know the content is safe > > Dear Don, dear Mahesh, > > > Am 10.12.20 um 21:35 schrieb Don Brace: > > From: Mahesh Rajashekhara <mahesh.rajashekhara@xxxxxxxxxxxxx> > > > > * Correct scsi-mid-layer sending more requests than > > exposed host Q depth causing firmware ASSERT issue. > > * Add host Qdepth counter. > > This supposedly fixes the regression between Linux 5.4 and 5.9, which > we reported in [1]. > > kernel: smartpqi 0000:89:00.0: controller is offline: status > code 0x6100c > kernel: smartpqi 0000:89:00.0: controller offline > > Thank you for looking into this issue and fixing it. We are going to > test this. > > For easily finding these things in the git history or the WWW, it > would be great if these log messages could be included (in the > future). > DON> Thanks for your suggestion. Well add them in the next time. > > Also, that means, that the regression is still present in Linux 5.10, > released yesterday, and this commit does not apply to these versions. > > DON> They have started 5.10-RC7 now. So possibly 5.11 or 5.12 > depending when all of the patches are applied. The patch in question > is among 28 other patches. > > Mahesh, do you have any idea, what commit caused the regression and > why the issue started to show up? > DON> The smartpqi driver sets two scsi_host_template member fields: > .can_queue and .nr_hw_queues. But we have not yet converted to > host_tagset. So the queue_depth becomes nr_hw_queues * can_queue, > which is more than the hw can support. That can be verified by > looking at scsi_host.h. > /* > * In scsi-mq mode, the number of hardware queues supported > by the LLD. > * > * Note: it is assumed that each hardware queue has a queue > depth of > * can_queue. In other words, the total queue depth per host > * is nr_hw_queues * can_queue. However, for when host_tagset > is set, > * the total queue depth is can_queue. > */ > > So, until we make this change, the queue_depth change prevents the > above issue from happening. can_queue and nr_hw_queues have been set like this as long as the driver existed. Why did Paul observe a regression with 5.9? And why can't you simply set can_queue to (ctrl_info->scsi_ml_can_queue / nr_hw_queues)? Regards, Martin