Re: [PATCH 00/20] libsas and pm8001 fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/11/22 22:54, John Garry wrote:
> Hi Damien,
> 
>>>
>>>>> Sometimes I get TMF timeouts, which is a bad situation. I guess it's a
>>>>> subtle driver bug, but where ....?
>>>> What is the command failing ? Always the same ? Can you try adding scsi
>>>> trace to see the commands ?
>>>
>>> This is the same issue I have had since day #1.
>>>
>>> Generally mount/unmount or even fdisk -l fails after booting into
>>> miniramfs. I wouldn't ever try to boot a distro.
>>
>> busybox ?
>>
> 
> Yes
> 
>>>
>>>>
>>>> If you are "lucky", it is always the same type of command like for the
>>>> NCQ NON DATA in my case.
>>>
>>> I'm just trying SAS disks to start with - so it's an SCSI READ command.
>>> SATA/STP support is generally never as robust for SAS HBAs (HW and LLD
>>> bugs are common - as this series is evidence) so I start on something
>>> more basic - however SATA/STP also has this issue.
>>>
>>> The command is sent successfully but just never completes. Then
>>> sometimes the TMFs for error handling timeout and sometimes succeed. I
>>> don't have much to do on....
>>
>> No SAS bus analyzer lying in a corner of the office ? :)
>> That could help...
> 
> None unfortunately
> 
>>
>> I will go to the office Monday. So I will get a chance to add SAS drives
>> to my setup to see what I get. I have only tested with SATA until now.
>> My controller is not the same chip as yours though.
> 
> jfyi, Ajish, cc'ed, from microchip says that they see the same issue on 
> their arm64 system. Unfortunately fixing it is not a priority for them. 
> So it is something which arm64 is exposing.
> 
> And I tried an old kernel - like 4.10 - on the same board and the pm8001 
> driver was working somewhat reliably (no hangs). It just looks like a 
> driver issue.
> 
> I'll have a look at the driver code again if I get a chance. It might be 
> a DMA issue.

There is one more thing that I find strange in the driver and that may
cause problems: tag 0 is a perfectly valid tag value that can be
returned by pm8001_tag_alloc() since find_first_zero_bit() will return 0
if the first bit is 0. And yet, there are many places in the driver that
treat !tag as an error. Extremely weird, if not outright broken...

I patched that and tested and everything seems OK... Could it be that
you are not seeing some completions because of that ?

I added the patch to my v3. Will send Monday.



-- 
Damien Le Moal
Western Digital Research



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux