Re: 4.15.8 Kernel - Strange linux-raid behavior, not sure where to send it.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2018-03-13 15:35 GMT+01:00 Roger Heflin <rogerheflin@xxxxxxxxx>:
> Downgrade back to 4.14.  You are lucky if your machines gets up before
> you hit it.  Mine only makes it through systemd 1 out of 3 times and
> when it does get up it then locks up on one the devices within an hour
> or less.
>
> The basics are the bug has the lockup if error handling has to be
> used.   I have a couple of disk that act up sometimes and when that
> hits it locks up those devices.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=198861
> https://bugzilla.redhat.com/show_bug.cgi?id=1552124
>
Thanks Roger, looks Greg already queued the fix 3be8828fc507 in his
stable-queue for 4.14.27 and 4.15.10

So we can expect the fix soon.

Cheers,
Jack

>
> On Tue, Mar 13, 2018 at 4:09 AM, 王金浦 <jinpuwang@xxxxxxxxx> wrote:
>> 2018-03-13 2:52 GMT+01:00 David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx>:
>>> All,
>>>
>>>   I have experienced 2 hardlocks on two separate multi-cpu servers in the past
>>> 24 hours after updating to 4.15.8. I do not know where the issue lies. One
>>> lockup occurred during a simple rsync from another box on the lan.
>>>
>>>   Both boxes have exhibited strange behavior regarding the linux-raid array
>>> (all disks are fine), but I receive spurious errors like (Out of IOMMU space),
>>> Huh? If this were related to one box, then I would suspect I had a flaky disk
>>> or cable, but same errors on both boxes -- something is fishy. Both are
>>> Archlinux servers, but SuperMicro boards with either 2 or 4 quad-core Opteron
>>> processors. The Out of IOMMU space shows up in the journal as:
>>>
>>> Mar 12 19:45:20 valhalla su[869]: pam_unix(su:session): session opened for
>>> user root by david(uid=1000)
>>> Mar 12 19:45:57 valhalla kernel: sata_nv 0000:00:05.0: PCI-DMA: Out of IOMMU
>>> space for 65536 bytes
>>> Mar 12 19:45:57 valhalla kernel: ata3: EH in SWNCQ mode,QC:qc_active 0x4
>>> sactive 0x4
>>> Mar 12 19:45:57 valhalla kernel: ata3: SWNCQ:qc_active 0x0 defer_bits 0x0
>>> last_issue_tag 0x1 DHFIS 0X0 DMAFIS 0X0 SDBFIS 0X0
>>> Mar 12 19:45:57 valhalla kernel: ata3: ATA_REG 0x40 ERR_REG 0x0
>>> Mar 12 19:45:57 valhalla kernel: ata3: tag : dhfis dmafis sdbfis sactive
>>> Mar 12 19:45:57 valhalla kernel: ata3.00: exception Emask 0x0 SAct 0x4 SErr
>>> 0x0 action 0x6
>>> Mar 12 19:45:57 valhalla kernel: ata3.00: failed command: WRITE FPDMA QUEUED
>>> Mar 12 19:45:57 valhalla kernel: ata3.00: cmd
>>> 61/00:10:00:d0:e4/0a:00:0f:00:00/40 tag 2 ncq dma 1310720 ou res
>>> 40/00:20:00:ea:e3/00:00:0f:00:00/40 Emask 0x40 (internal error)
>>> Mar 12 19:45:57 valhalla kernel: ata3.00: status: { DRDY }
>>> Mar 12 19:45:57 valhalla kernel: ata3: hard resetting link
>>> Mar 12 19:45:57 valhalla kernel: ata3: nv: skipping hardreset on occupied port
>>> Mar 12 19:45:58 valhalla kernel: ata3: SATA link up 1.5 GBPS (SSTATUS 113
>>> SCONTROL 300)
>>> Mar 12 19:45:58 valhalla kernel: ata3.00: configured for UDMA/133
>>> Mar 12 19:45:58 valhalla kernel: ata3: EH complete
>>> Mar 12 19:46:09 valhalla kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU
>>> space for 65536 bytes
>>> Mar 12 19:46:09 valhalla kernel: ata5: EH in SWNCQ mode,QC:qc_active 0x4
>>> sactive 0x4
>>> Mar 12 19:46:09 valhalla kernel: ata5: SWNCQ:qc_active 0x0 defer_bits 0x0
>>> last_issue_tag 0x1 dhfis 0x0 dmafis 0x0 sdbfis 0x0
>>> Mar 12 19:46:09 valhalla kernel: ata5: ATA_REG 0x40 ERR_REG 0x0
>>> Mar 12 19:46:09 valhalla kernel: ata5: tag : dhfis dmafis sdbfis sactive
>>> Mar 12 19:46:09 valhalla kernel: ata5.00: exception Emask 0x0 SAct 0x4 SErr
>>> 0x0 action 0x6
>>> Mar 12 19:46:09 valhalla kernel: ata5.00: failed command: WRITE FPDMA QUEUED
>>> Mar 12 19:46:09 valhalla kernel: ata5.00: cmd
>>> 61/00:10:00:c0:f8/0a:00:0f:00:00/40 tag 2 ncq dma 1310720 ou res
>>> 40/00:20:00:da:f7/00:00:0f:00:00/40 Emask 0x40 (internal error)
>>> Mar 12 19:46:09 valhalla kernel: ata5.00: status: { DRDY }
>>> Mar 12 19:46:09 valhalla kernel: ata5: hard resetting link
>>> Mar 12 19:46:09 valhalla kernel: ata5: nv: skipping hardreset on occupied port
>>> Mar 12 19:46:10 valhalla kernel: ata5: SATA link up 1.5 Gbps (SStatus 113
>>> SControl 300)
>>> Mar 12 19:46:10 valhalla kernel: ata5.00: configured for UDMA/133
>>> Mar 12 19:46:10 valhalla kernel: ata5: EH complete
>>>
>>>   I'm just taking a shot in the dark here and don't know whether the basis is
>>> a mdadm issue, or something new in the latest kernel. Any deciphering guidance
>>> would be appreciated.
>>>
>>> --
>>> David C. Rankin, J.D.,P.E.
>> Hi, David
>>
>> Looks more like libata related, +cc to linux-ide.
>> Do you know which kernel works for you,  have you tried latest
>> upstream? so it's easier to narrow it down.
>>
>> Regards,
>> Jack
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux