Re: Implementing low level timeouts within MD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2007-11-02 at 03:41 -0500, Alberto Alonso wrote:
> On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
> > Not in the older kernel versions you were running, no.
> 
> These "old versions" (specially the RHEL) are supposed to be
> the official versions supported by Redhat and the hardware 
> vendors, as they were very specific as to what versions of 
> Linux were supported.

The key word here being "supported".  That means if you run across a
problem, we fix it.  It doesn't mean there will never be any problems.

>  Of all people, I would think you would
> appreciate that. Sorry if I sound frustrated and upset, but 
> it is clearly a result of what "supported and tested" really 
> means in this case.

I'm sorry, but given the "specially the RHEL" case you cited, it is
clear I can't help you.  No one can.  You were running first gen
software on first gen hardware.  You show me *any* software company
who's first gen software never has to be updated to fix bugs, and I'll
show you a software company that went out of business they day after
they released their software.

Our RHEL3 update kernels contained *significant* updates to the SATA
stack after our GA release, replete with hardware driver updates and bug
fixes.  I don't know *when* that RHEL3 system failed, but I would
venture a guess that it wasn't prior to RHEL3 Update 1.  So, I'm
guessing you didn't take advantage of those bug fixes.  And I would
hardly call once a quarter "continuously updating" your kernel.  In any
case, given your insistence on running first gen software on first gen
hardware and not taking advantage of the support we *did* provide to
protect you against that failure, I say again that I can't help you.

>  I don't want to go into a discussion of
> commercial distros, which are "supported" as this is nor the
> time nor the place but I don't want to open the door to the
> excuse of "its an old kernel", it wasn't when it got installed.

I *really* can't help you.

> Outside of the rejected suggestion, I just want to figure out 
> when software raid works and when it doesn't. With SATA, my 
> experience is that it doesn't. So far I've only received one 
> response stating success (they were using the 3ware and Areca 
> product lines).

No, your experience, as you listed it, is that
SATA/usb-storage/Serverworks PATA failed you.  The software raid never
failed to perform as designed.

However, one of the things you are doing here is drawing sweeping
generalizations that are totally invalid.  You are saying your
experience is that SATA doesn't work, but you aren't qualifying it with
the key factor: SATA doesn't work in what kernel version?  It is
pointless to try and establish whether or not something like SATA works
in a global, all kernel inclusive fashion because the answer to the
question varies depending on the kernel version.  And the same is true
of pretty much every driver you can name.  This is why commercial
companies don't just certify hardware, but the software version that
actually works as opposed to all versions.  In truth, you have *no idea*
if SATA works today, because you haven't tried.  As David pointed out,
there was a significant overhaul of the SATA error recovery that took
place *after* the kernel versions that failed you which totally
invalidates your experiences and requires retesting of the later
software to see if it performs differently.

> Anyway, this thread just posed the question, and as Neil pointed
> out, it isn't feasible/worth to implement timeouts within the md
> code. I think most of the points/discussions raised beyond that
> original question really belong to the thread "Software RAID when 
> it works and when it doesn't" 
> 
> I do appreciate all comments and suggestions and I hope to keep
> them coming. I would hope however to hear more about success
> stories with specific hardware details. It would be helpfull
> to have a list of tested configurations that are known to work.

I've had *lots* of success with software RAID as I've been running it
for years.  I've had old PATA drives fail, SCSI drives fail, FC drives
fail, and I've had SATA drives that got kicked from the array due to
read errors but not out and out drive failures.  But I keep at least
reasonably up to date with my kernels.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux