On 2/11/22 22:54, John Garry wrote: > Hi Damien, > >>> >>>>> Sometimes I get TMF timeouts, which is a bad situation. I guess it's a >>>>> subtle driver bug, but where ....? >>>> What is the command failing ? Always the same ? Can you try adding scsi >>>> trace to see the commands ? >>> >>> This is the same issue I have had since day #1. >>> >>> Generally mount/unmount or even fdisk -l fails after booting into >>> miniramfs. I wouldn't ever try to boot a distro. >> >> busybox ? >> > > Yes > >>> >>>> >>>> If you are "lucky", it is always the same type of command like for the >>>> NCQ NON DATA in my case. >>> >>> I'm just trying SAS disks to start with - so it's an SCSI READ command. >>> SATA/STP support is generally never as robust for SAS HBAs (HW and LLD >>> bugs are common - as this series is evidence) so I start on something >>> more basic - however SATA/STP also has this issue. >>> >>> The command is sent successfully but just never completes. Then >>> sometimes the TMFs for error handling timeout and sometimes succeed. I >>> don't have much to do on.... >> >> No SAS bus analyzer lying in a corner of the office ? :) >> That could help... > > None unfortunately > >> >> I will go to the office Monday. So I will get a chance to add SAS drives >> to my setup to see what I get. I have only tested with SATA until now. >> My controller is not the same chip as yours though. > > jfyi, Ajish, cc'ed, from microchip says that they see the same issue on > their arm64 system. Unfortunately fixing it is not a priority for them. > So it is something which arm64 is exposing. > > And I tried an old kernel - like 4.10 - on the same board and the pm8001 > driver was working somewhat reliably (no hangs). It just looks like a > driver issue. > > I'll have a look at the driver code again if I get a chance. It might be > a DMA issue. There is one more thing that I find strange in the driver and that may cause problems: tag 0 is a perfectly valid tag value that can be returned by pm8001_tag_alloc() since find_first_zero_bit() will return 0 if the first bit is 0. And yet, there are many places in the driver that treat !tag as an error. Extremely weird, if not outright broken... I patched that and tested and everything seems OK... Could it be that you are not seeing some completions because of that ? I added the patch to my v3. Will send Monday. -- Damien Le Moal Western Digital Research