Downgrade back to 4.14. You are lucky if your machines gets up before you hit it. Mine only makes it through systemd 1 out of 3 times and when it does get up it then locks up on one the devices within an hour or less. The basics are the bug has the lockup if error handling has to be used. I have a couple of disk that act up sometimes and when that hits it locks up those devices. https://bugzilla.kernel.org/show_bug.cgi?id=198861 https://bugzilla.redhat.com/show_bug.cgi?id=1552124 On Tue, Mar 13, 2018 at 4:09 AM, 王金浦 <jinpuwang@xxxxxxxxx> wrote: > 2018-03-13 2:52 GMT+01:00 David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx>: >> All, >> >> I have experienced 2 hardlocks on two separate multi-cpu servers in the past >> 24 hours after updating to 4.15.8. I do not know where the issue lies. One >> lockup occurred during a simple rsync from another box on the lan. >> >> Both boxes have exhibited strange behavior regarding the linux-raid array >> (all disks are fine), but I receive spurious errors like (Out of IOMMU space), >> Huh? If this were related to one box, then I would suspect I had a flaky disk >> or cable, but same errors on both boxes -- something is fishy. Both are >> Archlinux servers, but SuperMicro boards with either 2 or 4 quad-core Opteron >> processors. The Out of IOMMU space shows up in the journal as: >> >> Mar 12 19:45:20 valhalla su[869]: pam_unix(su:session): session opened for >> user root by david(uid=1000) >> Mar 12 19:45:57 valhalla kernel: sata_nv 0000:00:05.0: PCI-DMA: Out of IOMMU >> space for 65536 bytes >> Mar 12 19:45:57 valhalla kernel: ata3: EH in SWNCQ mode,QC:qc_active 0x4 >> sactive 0x4 >> Mar 12 19:45:57 valhalla kernel: ata3: SWNCQ:qc_active 0x0 defer_bits 0x0 >> last_issue_tag 0x1 DHFIS 0X0 DMAFIS 0X0 SDBFIS 0X0 >> Mar 12 19:45:57 valhalla kernel: ata3: ATA_REG 0x40 ERR_REG 0x0 >> Mar 12 19:45:57 valhalla kernel: ata3: tag : dhfis dmafis sdbfis sactive >> Mar 12 19:45:57 valhalla kernel: ata3.00: exception Emask 0x0 SAct 0x4 SErr >> 0x0 action 0x6 >> Mar 12 19:45:57 valhalla kernel: ata3.00: failed command: WRITE FPDMA QUEUED >> Mar 12 19:45:57 valhalla kernel: ata3.00: cmd >> 61/00:10:00:d0:e4/0a:00:0f:00:00/40 tag 2 ncq dma 1310720 ou res >> 40/00:20:00:ea:e3/00:00:0f:00:00/40 Emask 0x40 (internal error) >> Mar 12 19:45:57 valhalla kernel: ata3.00: status: { DRDY } >> Mar 12 19:45:57 valhalla kernel: ata3: hard resetting link >> Mar 12 19:45:57 valhalla kernel: ata3: nv: skipping hardreset on occupied port >> Mar 12 19:45:58 valhalla kernel: ata3: SATA link up 1.5 GBPS (SSTATUS 113 >> SCONTROL 300) >> Mar 12 19:45:58 valhalla kernel: ata3.00: configured for UDMA/133 >> Mar 12 19:45:58 valhalla kernel: ata3: EH complete >> Mar 12 19:46:09 valhalla kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU >> space for 65536 bytes >> Mar 12 19:46:09 valhalla kernel: ata5: EH in SWNCQ mode,QC:qc_active 0x4 >> sactive 0x4 >> Mar 12 19:46:09 valhalla kernel: ata5: SWNCQ:qc_active 0x0 defer_bits 0x0 >> last_issue_tag 0x1 dhfis 0x0 dmafis 0x0 sdbfis 0x0 >> Mar 12 19:46:09 valhalla kernel: ata5: ATA_REG 0x40 ERR_REG 0x0 >> Mar 12 19:46:09 valhalla kernel: ata5: tag : dhfis dmafis sdbfis sactive >> Mar 12 19:46:09 valhalla kernel: ata5.00: exception Emask 0x0 SAct 0x4 SErr >> 0x0 action 0x6 >> Mar 12 19:46:09 valhalla kernel: ata5.00: failed command: WRITE FPDMA QUEUED >> Mar 12 19:46:09 valhalla kernel: ata5.00: cmd >> 61/00:10:00:c0:f8/0a:00:0f:00:00/40 tag 2 ncq dma 1310720 ou res >> 40/00:20:00:da:f7/00:00:0f:00:00/40 Emask 0x40 (internal error) >> Mar 12 19:46:09 valhalla kernel: ata5.00: status: { DRDY } >> Mar 12 19:46:09 valhalla kernel: ata5: hard resetting link >> Mar 12 19:46:09 valhalla kernel: ata5: nv: skipping hardreset on occupied port >> Mar 12 19:46:10 valhalla kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 >> SControl 300) >> Mar 12 19:46:10 valhalla kernel: ata5.00: configured for UDMA/133 >> Mar 12 19:46:10 valhalla kernel: ata5: EH complete >> >> I'm just taking a shot in the dark here and don't know whether the basis is >> a mdadm issue, or something new in the latest kernel. Any deciphering guidance >> would be appreciated. >> >> -- >> David C. Rankin, J.D.,P.E. > Hi, David > > Looks more like libata related, +cc to linux-ide. > Do you know which kernel works for you, have you tried latest > upstream? so it's easier to narrow it down. > > Regards, > Jack > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html