Re: What's the typical RAID10 setup?

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Thu, 3 Feb 2011 13:50:49 -0200

hummm, nice
keld (or anyone), do you know someone (with time, not much, total time
i think it´s just 2 hours) to try develop modifications on raid1
read_balance function?
what modification, today read_balance have distance (current_head -
next_head), multiply it by a number at /sys/block/md0/distance_rate,
and make add read_size*byte_rate (byte_rate at
/sys/block/md0/byte_read_rate), with this, the algorithm will make
minimal time, and not minimal distance
with this, i can get better read_balance (for ssd)
for a second time we could implement device queue time to end (i think
we will work about 1 day to get it working with all device
schedulers), but it´s not for now

2011/2/3 Keld Jørn Simonsen <keld@xxxxxxxxxx>:
> On Thu, Feb 03, 2011 at 12:35:52PM -0200, Roberto Spadim wrote:
>> =] i think that we can end discussion and conclude that context (test
>> / production) allow or don't allow lucky on probability, what's lucky?
>> for production, lucky = poor disk, for production we don't allow
>> failed disks, we have smart to predict, and when a disk fail we change
>> many disks to prevent another disk fail
>>
>> could we update our raid wiki with some informations about this discussion?
>
> I would like to, but it is a bit complicated.
> Anyway I think there already is something there on the wiki.
> And then, for one of the most important raid types in Linux MD,
> namely raid10, I am not sure what to write. It could be raid1+0, or
> raid0+1 like, and as far as I kow, it is raid0+1 for F2:-(
> but I don't know for n2 and o2.
>
> The German version on raid at wikipedia has a lot of info on probability
> http://de.wikipedia.org/wiki/RAID - but it is wrong a number of places.
> I have tried to correct it, but the German version is moderated, and
> they don't know what they are writing about.
> http://de.wikipedia.org/wiki/RAID
>
> Best regards
> Keld
>
>> 2011/2/3 Drew <drew.kay@xxxxxxxxx>:
>> >> for test, raid1 and after raid0 have better probability to don't stop
>> >> raid10, but it's a probability... don't believe in lucky, since it's
>> >> just for test, not production, it doesn't matter...
>> >>
>> >> what i whould implement? for production? anyone, if a disk fail, all
>> >> array should be replaced (if without money replace disk with small
>> >> life)
>> >
>> > A lot of this discussion about failure rates and probabilities is
>> > academic. There are assumptions about each disk having it's own
>> > independent failure probability, which if that can not be predicted
>> > must be assumed to be 50%.  At the end of the day I agree that when
>> > the first disk fails the RAID is degraded and one *must* take steps to
>> > remedy that. This discussion is more about why RAID 10 (1+0) is better
>> > then 0+1.
>> >
>> > On our production systems we work with our vendor to ensure the
>> > individual drives we get aren't from the same batch/production run,
>> > thereby mitigating some issues around flaws in specific batches. We
>> > keep spare drives on hand for all three RAID arrays, so as to minimize
>> > the time we're operating in a degraded state. All data on RAID arrays
>> > is backed up nightly to storage which is then mirrored off-site.
>> >
>> > At the end of the day our decision around what RAID type (10/5/6) to
>> > use was based on a balance between performance, safety, & capacity
>> > then on specific failure criteria. RAID 10 backs the iSCSI LUN that
>> > our VMware cluster uses for the individual OSes, and the data
>> > partition for the accounting database server. RAID 5 backs the
>> > partitions we store user data one. And RAID 6 backs the NASes we use
>> > for our backup system.
>> >
>> > RAID 10 was chosen for performance reasons. It doesn't have to
>> > calculate parity on every write so for the OS & database, which do a
>> > lot of small reads & writes, it's faster. For user disks we went with
>> > RAID 5 because we get more space in the array at a small performance
>> > penalty, which is fine as the users have to access the file server
>> > over the LAN and the bottle neck is the pipe between the switch & the
>> > VM, not between the iSCSI SAN & the server. For backups we went with
>> > RAID 6 because the performance & storage penalties for the array were
>> > outweighed by the need for maximum safety.
>> >
>> >
>> >
>> > --
>> > Drew
>> >
>> > "Nothing in life is to be feared. It is only to be understood."
>> > --Marie Curie
>> >
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html