On 9/14/23 09:29, John David Anglin wrote: >>> dave@atlas:~/linux/linux$ git diff drivers/scsi/scsi.c >>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c >>> index d0911bc28663..dc3a283ebd75 100644 >>> --- a/drivers/scsi/scsi.c >>> +++ b/drivers/scsi/scsi.c >>> @@ -578,6 +578,8 @@ static bool scsi_cdl_check_cmd(struct scsi_device *sdev, u8 opcode, u16 sa, >>> int ret; >>> u8 cdlp; >>> >>> + return false; >>> + >>> /* Check operation code */ >>> ret = scsi_report_opcode(sdev, buf, SCSI_CDL_CHECK_BUF_LEN, opcode, sa); >>> if (ret <= 0) >> It is weird that this solves anything... the MAINTENANCE_IN command issued by >> scsi_report_opcode() ends up being emulated in libata with >> ata_scsiop_maint_in(). There are no actual commands issued to the drive, so >> nothing that could actually fail/cause issues. By the time this is issued, the >> ATA drive is also fully probed... >> >> Or is the drive connected to the Broadcom HBA you have ? In that case, libata is >> not used and the HBA FW SAT (scsi-ata-translation) is likely to blame. > /boot, / and swap partitions reside on a ST373207LW drive connected to a Broadcom HBA. A > ST4000VN008-2DR1 drive is connected to the Silicon Image, Inc. SiI 3124 PCI-X Serial > ATA Controller. It mounts on /home. There's also a cdrom connected to the Silicon > Image, Inc. PCI0680 Ultra ATA-133 Host Controller and another ST4000VN008-2DR1 drive > connected to a Broadcom HBA. There are two Broadcom HBAs. > > I think the issue is with the root ST373207LW drive. The console output indicates that the > ROOT drive doesn't exist when the boot fails. > > Your change only appeared to affect actual SCSI drives. That's why I tried disabling CDL. OK. I can see from the dmesg snippets you sent that the drives on the ATA ports seem OK. A quick search tells me that the ST373207LW drive is a Ultra320 SCSI drive, not ATA. So that MAINTENANCE_IN command issued by scsi_report_opcode() will straight as-is. This command has been issued to devices since a long time ago, and given that your system was working, the drive is probably fine with it in its simplest form (one command format). CDL changes however added probing command support with the service action field (One command format with service action). And what may be happening is that the drive does not like/does not support that format and chokes on it. Let me check the specs to see what scsi level support this format. What is sure is that Ultra320 SCSI disks will definitely *not* support CDL, so we could exit early in scsi_cdl_check_cmd() returning false for drives with an old scsi level support. Let me send something along these lines. >> >> Could you send a full dmesg output for a clean boot and for a failed one so that >> I can compare ? > I'll try to get this together tomorrow. > > Dave > -- Damien Le Moal Western Digital Research