Re: sata_mv issues with more than 1 disk in FC5 2.6.16-1.2096

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Roger Heflin wrote:
Hello,

I know the FC5 2.6.16-1.2096 is based off of one of the 2.6.16
stable releases.   I am not sure exactly which one this is based
on.

I have several machines with this controller, the ones with a single
disks all work correctly using the Marvell controller.

An identical MB/controller/chassis fails when there are 2 disks
using software mirroring.  We have tested with 2 separate chassis,
both exhibit the same failure.

Moving the 2 disks to a different built-in sata controller
(sata_nv) results in the disks and the mirror working
correctly.

The errors that it returns are:
ata4: status=0xd0 { Busy }
ata2: status=0xd0 { Busy }
And the machine is terribly slow while this error is happening.
The error appears to be happening when the disks are trying
to be mounted.

We have tested the disk on a couple of different combinations of
ports and this does not seem to change anything.

The single disk machines don't get this error like the 2 disk
machines, though  they do get this error, every 30 minutes
or so (probably from  smartd), but this error does not
appear to be causing issues.
Apr 19 16:26:19 lab229 kernel: ata1: status=0xd0 { Busy }
Apr 19 16:26:19 lab229 kernel: ATA: abnormal status 0xD0 on
    port 0xFFFFC2001012211C

Any thoughts?



A follow up, it passes the test with nosmp set, it fails every time
without nosmp.

acpi_irq_nobalance and noirqbalance and the irqbalance service
stopped or never started it works better, it takes longer to lock
up, and after it locks up it appears to come back and work for
around 30-60 seconds and then starts working again after being
hung for 30-60 seconds, but eventually it appears to completely
lock up and not work anymore.   MTBF is around 30 seconds for
the first event, and it stops working completely after several
minutes  vs. 1-2 seconds for the irq balancing enabled. The output of
/proc/interrupts does confirm that the interrupts are not being balanced.


To make the failure happen it appears to require several conditions:
SMP 2 disks or more doing heavy IO.

Having the irq's balanced appears to make the problem happen much faster,
but not having the irq's balanced does not appear to stop the problem.

I have not tested the nosmp option long enough to conclude that it
is very unlikely to occur in that case.

With "noapic" things seem to be stable and seem to work with both disk
doing io, and this is with the irq's still being balanced.

When it works the speeds are good in all cases 65MiB/second for the first
disk, and 130MiB/second when the second disk is added, all tested with dd.

What does noapic fixing the problem say about the cause
of the underlying issue?

                                     Roger



-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux