On Mon, 2017-01-16 at 10:22 +0100, Ingo Molnar wrote: > * James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote: > > > So there's a new mpt3sas SCSI driver boot regression, introduced > > > in > > > this merge window, which made one of my servers unbootable. > > > > We're not reverting a fix that would cause regressions for others. > > You really need to reconsider that stance ... > > > However, The fix was manifestly wrong, so does this fix of the fix > > work for you: > > > > http://marc.info/?l=linux-scsi&m=148329237807604 > > > > It's been languishing a bit because no-one seemed to care enough to > > test or review it. IOf you can add a tested by, that will give the > > two > > we need to push it. > > I have tested your other patch that you pointed to: > > http://marc.info/?l=linux-scsi&m=148449968522828 > > Which patch fixes the bug too (I removed my revert first) - so you > can add my: > > Reported-by: Ingo Molnar <mingo@xxxxxxxxxx> > Tested-by: Ingo Molnar <mingo@xxxxxxxxxx> Thanks ... just checking you tested the second version with the concurrency part? > BTW., is it wise to work around the out of spec firmware in the > mpt3sas code and leave the overly optimistic assumptions in the SCSI > code intact? The problem is that other SCSI hardware could be > affected as well - and especially enterprise class server hardware > has long testing and thus regression latencies (as my example > proves). Realistically, there is no other card. Every other SAS implementation uses the in-kernel SAT, which does the right thing. We've suggested on a few occasions that the mpt SAS might like to use it as well, given we keep tripping on SAT problems in their firmware. > Wouldn't it be more robust to only submit one pass-through command at > a time from the SCSI layer, and maybe opt-in hardware that is known > to implement the SAT standard fully? Unfortunately it's a lot more complex: the standard gives a queueing mechanism for SAT pass through, so mostly you *can* have multiple commands outstanding, so it looks like we shouldn't globally restrict that. However, it seems the mpt3 firmware is using a queue single command model *and* not doing the right thing with return codes hence the failure. Since the failure mode is mpt3 specific, I think the best place for the fix is in their code. We can revisit this decision if something else comes along that also has this problem (UAS springs to mind). James > (But I'm just kibitzing here really.) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html