Dear Linux folks,
On 01.05.19 11:34, Paul Menzel wrote:
On an MSI B350M MORTAR (MS-7A37) with all firmwares, and the AHCI SSD
Crucial MX500, *sometimes* boot and resume are delayed by four to five
seconds, and the log contains the message *failed to set xfermode*.
[ 0.474150] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 0)
[ 0.474267] microcode: Microcode Update Driver: v2.2.
[ 0.474294] AVX2 version of gcm_enc/dec engaged.
[ 0.474295] AES CTR mode by8 optimization enabled
[ 0.474304] ata9.00: supports DRM functions and may not be fully accessible
[ 0.474368] ata9.00: ATA-10: CT1000MX500SSD4, M3CR020, max UDMA/133
[ 0.474370] ata9.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 0.474569] ata9.00: READ LOG DMA EXT failed, trying PIO
[ 0.474571] ata9.00: failed to get Identify Device Data, Emask 0x40
[ 0.474572] ata9.00: ATA Identify Device Log not supported
[ 0.474573] ata9.00: Security Log not supported
[ 0.474576] ata9.00: failed to set xfermode (err_mask=0x40)
[ 0.493193] sched_clock: Marking stable (504159603, -10991650)->(615393192, -122225239)
[ 0.493768] registered taskstats version 1
[ 0.493770] Loading compiled-in X.509 certificates
[ 0.497894] Loaded X.509 cert 'Build time autogenerated kernel key: 4682756457c0b7b167f0b726943cf53ade93bd78'
[ 0.497984] zswap: loaded using pool lzo/zbud
[ 0.498237] kmemleak: Kernel memory leak detector initialized
[ 0.498243] kmemleak: Automatic memory scanning thread started
[ 0.506469] Key type big_key registered
[ 0.508864] Magic number: 11:580:189
[ 0.508914] acpi device:48: hash matches
[ 0.508922] acpi device:1b: hash matches
[ 0.509017] rtc_cmos 00:02: setting system clock to 2019-03-29T17:09:41 UTC (1553879381)
[ 0.509485] After kernel_init_freeable
[ 1.364358] tsc: Refined TSC clocksource calibration: 3499.982 MHz
[ 1.364370] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x32734013f67, max_idle_ns: 440795323134 ns
[ 1.364419] clocksource: Switched to clocksource tsc
[ 5.593141] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 0)
[ 5.593270] ata9.00: supports DRM functions and may not be fully accessible
[ 5.594006] ata9.00: supports DRM functions and may not be fully accessible
[ 5.594600] ata9.00: configured for UDMA/133
[ 5.595893] scsi 8:0:0:0: Direct-Access ATA CT1000MX500SSD4 020 PQ: 0 ANSI: 5
[ 5.596611] sd 8:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[ 5.596613] sd 8:0:0:0: [sda] 4096-byte physical blocks
[ 5.596626] sd 8:0:0:0: [sda] Write Protect is off
[ 5.596629] sd 8:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 5.596650] sd 8:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5.596837] sd 8:0:0:0: Attached scsi generic sg0 type 0
[ 5.597329] sda: sda1 sda2 sda3
[ 5.599079] sd 8:0:0:0: [sda] Attached SCSI disk
This happens with all Linux kernel versions – tested since 4.14.
Searching for this error on the Web, several people seem to be affected
with different boards and drives [1].
In a lot of those reports, I also see the message *Security Log not
supported*.
Do you have an idea, why this is only happening sometimes? Is it likely
a drive firmware issue? As it affects several vendors, can Linux work
around this problem (if it can be pinpointed)?
The first error seems to be `READ LOG DMA EXT failed, trying PIO`. The
code looks like below [2].
retry:
ata_tf_init(dev, &tf);
if (dev->dma_mode && ata_id_has_read_log_dma_ext(dev->id) &&
!(dev->horkage & ATA_HORKAGE_NO_DMA_LOG)) {
tf.command = ATA_CMD_READ_LOG_DMA_EXT;
tf.protocol = ATA_PROT_DMA;
dma = true;
} else {
tf.command = ATA_CMD_READ_LOG_EXT;
tf.protocol = ATA_PROT_PIO;
dma = false;
}
tf.lbal = log;
tf.lbam = page;
tf.nsect = sectors;
tf.hob_nsect = sectors >> 8;
tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_LBA48 | ATA_TFLAG_DEVICE;
err_mask = ata_exec_internal(dev, &tf, NULL, DMA_FROM_DEVICE,
buf, sectors * ATA_SECT_SIZE, 0);
if (err_mask && dma) {
dev->horkage |= ATA_HORKAGE_NO_DMA_LOG;
ata_dev_warn(dev, "READ LOG DMA EXT failed, trying PIO\n");
goto retry;
}
DPRINTK("EXIT, err_mask=%x\n", err_mask);
return err_mask;
So when trying PIO, that also seems to fail, and the error mask 0x40 is
propagated up.
enum ata_completion_errors {
AC_ERR_DEV = (1 << 0), /* device reported error */
AC_ERR_HSM = (1 << 1), /* host state machine violation */
AC_ERR_TIMEOUT = (1 << 2), /* timeout */
AC_ERR_MEDIA = (1 << 3), /* media error */
AC_ERR_ATA_BUS = (1 << 4), /* ATA bus error */
AC_ERR_HOST_BUS = (1 << 5), /* host bus error */
AC_ERR_SYSTEM = (1 << 6), /* system error */
AC_ERR_INVALID = (1 << 7), /* invalid argument */
AC_ERR_OTHER = (1 << 8), /* unknown */
AC_ERR_NODEV_HINT = (1 << 9), /* polling device detection hint */
AC_ERR_NCQ = (1 << 10), /* marker for offending NCQ qc */
};
So 0x40 is AC_ERR_HOST_BUS, right? Any idea, what might cause this?
Would it be helpful to enable the debug messages?
#undef ATA_DEBUG /* debugging output */
Kind regards,
Paul
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=195895[2]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c?id=1daa0449d287a109b93c4516914eddeff4baff65#n2075
[3]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/libata.h?id=1daa0449d287a109b93c4516914eddeff4baff65#n508