2018-03-13 2:52 GMT+01:00 David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx>: > All, > > I have experienced 2 hardlocks on two separate multi-cpu servers in the past > 24 hours after updating to 4.15.8. I do not know where the issue lies. One > lockup occurred during a simple rsync from another box on the lan. > > Both boxes have exhibited strange behavior regarding the linux-raid array > (all disks are fine), but I receive spurious errors like (Out of IOMMU space), > Huh? If this were related to one box, then I would suspect I had a flaky disk > or cable, but same errors on both boxes -- something is fishy. Both are > Archlinux servers, but SuperMicro boards with either 2 or 4 quad-core Opteron > processors. The Out of IOMMU space shows up in the journal as: > > Mar 12 19:45:20 valhalla su[869]: pam_unix(su:session): session opened for > user root by david(uid=1000) > Mar 12 19:45:57 valhalla kernel: sata_nv 0000:00:05.0: PCI-DMA: Out of IOMMU > space for 65536 bytes > Mar 12 19:45:57 valhalla kernel: ata3: EH in SWNCQ mode,QC:qc_active 0x4 > sactive 0x4 > Mar 12 19:45:57 valhalla kernel: ata3: SWNCQ:qc_active 0x0 defer_bits 0x0 > last_issue_tag 0x1 DHFIS 0X0 DMAFIS 0X0 SDBFIS 0X0 > Mar 12 19:45:57 valhalla kernel: ata3: ATA_REG 0x40 ERR_REG 0x0 > Mar 12 19:45:57 valhalla kernel: ata3: tag : dhfis dmafis sdbfis sactive > Mar 12 19:45:57 valhalla kernel: ata3.00: exception Emask 0x0 SAct 0x4 SErr > 0x0 action 0x6 > Mar 12 19:45:57 valhalla kernel: ata3.00: failed command: WRITE FPDMA QUEUED > Mar 12 19:45:57 valhalla kernel: ata3.00: cmd > 61/00:10:00:d0:e4/0a:00:0f:00:00/40 tag 2 ncq dma 1310720 ou res > 40/00:20:00:ea:e3/00:00:0f:00:00/40 Emask 0x40 (internal error) > Mar 12 19:45:57 valhalla kernel: ata3.00: status: { DRDY } > Mar 12 19:45:57 valhalla kernel: ata3: hard resetting link > Mar 12 19:45:57 valhalla kernel: ata3: nv: skipping hardreset on occupied port > Mar 12 19:45:58 valhalla kernel: ata3: SATA link up 1.5 GBPS (SSTATUS 113 > SCONTROL 300) > Mar 12 19:45:58 valhalla kernel: ata3.00: configured for UDMA/133 > Mar 12 19:45:58 valhalla kernel: ata3: EH complete > Mar 12 19:46:09 valhalla kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU > space for 65536 bytes > Mar 12 19:46:09 valhalla kernel: ata5: EH in SWNCQ mode,QC:qc_active 0x4 > sactive 0x4 > Mar 12 19:46:09 valhalla kernel: ata5: SWNCQ:qc_active 0x0 defer_bits 0x0 > last_issue_tag 0x1 dhfis 0x0 dmafis 0x0 sdbfis 0x0 > Mar 12 19:46:09 valhalla kernel: ata5: ATA_REG 0x40 ERR_REG 0x0 > Mar 12 19:46:09 valhalla kernel: ata5: tag : dhfis dmafis sdbfis sactive > Mar 12 19:46:09 valhalla kernel: ata5.00: exception Emask 0x0 SAct 0x4 SErr > 0x0 action 0x6 > Mar 12 19:46:09 valhalla kernel: ata5.00: failed command: WRITE FPDMA QUEUED > Mar 12 19:46:09 valhalla kernel: ata5.00: cmd > 61/00:10:00:c0:f8/0a:00:0f:00:00/40 tag 2 ncq dma 1310720 ou res > 40/00:20:00:da:f7/00:00:0f:00:00/40 Emask 0x40 (internal error) > Mar 12 19:46:09 valhalla kernel: ata5.00: status: { DRDY } > Mar 12 19:46:09 valhalla kernel: ata5: hard resetting link > Mar 12 19:46:09 valhalla kernel: ata5: nv: skipping hardreset on occupied port > Mar 12 19:46:10 valhalla kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 > SControl 300) > Mar 12 19:46:10 valhalla kernel: ata5.00: configured for UDMA/133 > Mar 12 19:46:10 valhalla kernel: ata5: EH complete > > I'm just taking a shot in the dark here and don't know whether the basis is > a mdadm issue, or something new in the latest kernel. Any deciphering guidance > would be appreciated. > > -- > David C. Rankin, J.D.,P.E. Hi, David Looks more like libata related, +cc to linux-ide. Do you know which kernel works for you, have you tried latest upstream? so it's easier to narrow it down. Regards, Jack -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html