Re: [problem] mpt2sas load fails with LSISAS2008

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 11, 2015 at 10:11 AM, Paul Johnson <pjay@xxxxxxxxxxx> wrote:
> On 02/10/2015 08:49 AM, Bjorn Helgaas wrote:
>>
>> We need to work out what's going wrong here before we rush into a
>> band-aid.
>>
>> What changed between v3.4 and v3.4.1 that exposed this problem?  "git
>> log --oneline v3.4..v3.4.1" doesn't show any likely culprits.  Paul,
>> are those the versions you tested?  Your dmesg logs at
>> https://bugzilla.kernel.org/show_bug.cgi?id=92351 show
>> "3.4.0-030400-generic" and "3.4.1-030401-generic" but I don't know
>> whether those are precisely v3.4 and v3.4.1.
>>
>> I assume this system works fine with Windows, and I doubt Windows has
>> a hack like "never move LSI devices."  So it would be useful to know
>> if we're doing something stupid in Linux that makes us trip over this.
>> Paul, if you happen to have Windows on this machine as well, a
>> complete AIDA64 report (free trial version at http://www.aida64.com)
>> would show what Windows did.
>>
>> The resource allocation we're doing is related SR-IOV, and
>> unfortunately we don't print enough information in dmesg to figure
>> everything out.  Paul, can you attach the complete "lspci -vv" output
>> to the bugzilla?
>>
>> Bjorn
>>
> The system I have had this problem on is in production, though it should be
> replaced by a real server. Because it is in use, I have used a separate boot
> disk to test kernels. I also have limited access to take the machine down.
> The system runs ubuntu server, though I have used an ubuntu desktop to test
> kernels. There is not a windows system on the machine, though, just
> guessing, LSI likely provides the windows driver and that driver may well
> have dealt with a problem that is looking to be specific to a firmware/bios
> version on this card.

That might be possible.  The issue seems to be related to changing BAR
addresses, and I expect that would be outside the scope of what the
driver can influence.  So I don't know whether Windows has a mechanism
for that or not.

> Someone found another of these cards here, so I tried it last night in an
> unused machine. It worked on the ubuntu 3.13 kernel without realloc. The
> card that has been the problem has these versions of firmware:
> [    9.004647] mpt2sas0: LSISAS2008: FWVersion(17.00.01.00),
> ChipRevision(0x03), BiosVersion(07.33.00.00)
>
> and the card that works has a newer version:
> [   15.725011] mpt2sas0: LSISAS2008: FWVersion(18.00.00.00),
> ChipRevision(0x03), BiosVersion(07.35.00.00)

Without seeing the dmesg log, I can't tell whether this card works
because (1) the LSI firmware is fixed or (2) the kernel didn't try to
change the BARs.

And I still don't have any clue about what changed between v3.4 and
v3.4.1 and triggered the problem.

Applying a fix without figuring out the real root cause of the problem
is voodoo programming, and I don't like to do that.

> Now, the cards are in very different machines so the difference could be due
> to the machines and not the firmware, but I would tend to go with the
> firmware difference. LSI firmware is now beyond both these firmware
> versions, but if I can find a copy of the older firmware, I'll try it on the
> card with the newer firmware.

We could tell from the dmesg log whether Linux changed the BARs.  I
wouldn't bother trying different LSI firmware versions until you
confirm that we changed the BARs.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux