Re: RAID 10 far and offset on-disk layouts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@xxxxxxxxxx> wrote:

> Hi Neil,
> let me recap from a previous message:
> 
>  >FAR LAYOUT
>  >md(4) states:
>  >"The first copy of all data blocks will be striped across the early >part
>  >of all drives in RAID0 fashion, and then the next copy of all blocks
>  >will be striped across a later section of all drives, always ensuring
>  >that all copies of any given block are on different drives"
>  >
>  >The "on different drives" part let me wonder _how_ are chunks
>  >distributed. On a 4-disk array, I can imagine some different schemas:
>  >
>  >1)	A1 A2 A3 A4
>  >	.. .. .. ..
>  >	A4 A1 A2 A3
>  >
>  >2)	A1 A2 A3 A4
>  >	.. .. .. ..
>  >	A2 A1 A4 A3
>  >
>  >The first schema is the one depicted by SuSe documentation [1], while
>  >the second is the one described by Wikipedia [2].
>  >
>  >Question 1: as the two schema have different reliability
>  >characteristics, which is really used?
> 
> SuSe entry: 
> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> 
> Wikipedia entry: 
> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how 
> far layout is depicted)
> 
> Keld kindly told me that the SuSe is simply not updated, as it depict a 
> situation changed with newer kernels. So my two questions:

I cannot see an important difference between the two pages you reference.
Both appear to be correct.

> 1) from which kernel the layout is the one depicted by Wikipedia?

These are both valid for any kernel since 2.6.18 with mdadm 2.5 or later.

> 2) it is possible, using mdadm, check what "far" layout is in use?

I think I know what you are talking about now.  The md driver in the kernel
supports two sorts of 'far' or 'offset' layouts for arrays where the number
of devices is not an integer multiple of the number of copies.
This has been supported in Linux since v3.9. but is not yet supported by
mdadm.

> 
>  From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout" 
> tell me if using far vs near vs offset layout, but not the physical 
> on-disk chunks organization (eg: far "type" 1 or 2).

This is because mdadm does not yet create or report on the new type.
When it does, the above command will be the correct command to find out which
layout is in use (but I don't yet know what the output will say exactly).

NeilBrown


> 
> Anyway, the thread started because I wonder why the OFFSET layout couple 
> each disk to other two disks. Let me quote again:
> 
>  >OFFSET LAYOUT
>  >md(4) states:
>  >"When 'offset' replicas are chosen, the multiple copies of a given >chunk
>  >are laid out on consecutive drives and at consecutive offsets.
>  >Effectively each stripe is duplicated and the copies are offset by one
>  >device."
>  >
>  >This means a schema like this:
>  >	
>  >3)	A1 A2 A3 A4
>  >	A4 A1 A2 A3
>  >	.. .. .. ..
>  >
>  >However, this is susceptible to any consecutive two-disk failures. A
>  >schema like
>  >
>  >4)	A1 A2 A3 A4
>  >	A2 A1 A4 A3
>  >
>  >would not suffer from this problem (eg: disk 2 & 3 can fail and the
>  >array is still working).
>  >
>  >Question 2: apart from simplicity, why the offset layout use the schema
>  >as n.3? I miss something?
> 
> Full thread link: http://marc.info/?t=138815504400002&r=1&w=2
> 
> Excuse me for the long email, I am simply trying to learn something :)
> Thank you very much.
> 
> On 01/13/2014 12:20 AM, NeilBrown wrote:
> > On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@xxxxxxxxxx> wrote:
> >
> >>>>
> >>>> Interesting. Two question:
> >>>> 1) from which kernel the layout is the one depicted by Wikipedia?
> >
> > Exactly what depiction in wikipedia are you referring to?  A link to the
> > image might help.
> >
> >>>> 2) it is possible, using mdadm, check what "far" layout is in use?
> >
> > mdadm --detail /dev/mdWHATEVER | grep Layout
> >
> >
> >>>
> >>> I cannot answer that. Neil Brown should know.
> >>>
> >>> Best regards
> >>> Keld
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> Hi all,
> >> anyone with an update on these two questions?
> >>
> >> I was thinking to use the kernel block trace facility to track disk
> >> access and infer the on-disk data structure, but I haven't tried for now.
> >>
> >> On the other hand, I carefully looked at mdadm output, without finding
> >> anything related to physical block placing.
> >
> > Look for "Layout".
> >
> > NeilBrown
> >
> >
> >>
> >> Any new advices on that regard?
> >> Thanks.
> >>
> >
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux