On 2/11/22 22:08, John Garry wrote: > On 11/02/2022 12:37, Damien Le Moal wrote: > > Hi Damien, > >>> Sometimes I get TMF timeouts, which is a bad situation. I guess it's a >>> subtle driver bug, but where ....? >> What is the command failing ? Always the same ? Can you try adding scsi >> trace to see the commands ? > > This is the same issue I have had since day #1. > > Generally mount/unmount or even fdisk -l fails after booting into > miniramfs. I wouldn't ever try to boot a distro. busybox ? > >> >> If you are "lucky", it is always the same type of command like for the >> NCQ NON DATA in my case. > > I'm just trying SAS disks to start with - so it's an SCSI READ command. > SATA/STP support is generally never as robust for SAS HBAs (HW and LLD > bugs are common - as this series is evidence) so I start on something > more basic - however SATA/STP also has this issue. > > The command is sent successfully but just never completes. Then > sometimes the TMFs for error handling timeout and sometimes succeed. I > don't have much to do on.... No SAS bus analyzer lying in a corner of the office ? :) That could help... I will go to the office Monday. So I will get a chance to add SAS drives to my setup to see what I get. I have only tested with SATA until now. My controller is not the same chip as yours though. > >> Though on mount, I would only expect a lot of >> read commands and not much else. > > Yes, and it is commonly the first SCSI read command which times out. It > reliably breaks quite early. So I can think we can rule out issues like > memory barriers/timing. > > There may be some writes and a flush >> too, so there will be "data" commands and "non data" commands. It may be >> an issue with non-data commands too ? >> > > Not sure on that. I guess it isn't. Anything special with the drives you are using ? Have you tried other drives to see if you get lucky ? > > Thanks, > John -- Damien Le Moal Western Digital Research