Re: crush choose firstn vs. indep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 14/01/2014 07:49, ZHOU Yuan wrote:> Hi Loic, thanks for the education!
> 
> I’m also trying to understand the new ‘indep’ mode. Is this new mode designed for Ceph-EC only? It seems that all of the data in 3-copy system are equivalent and this new algorithm should also work?
> 

In the best case scenario, using indep instead of firstn on replicated pools won't make a difference. However, if the crush mapper does not find the required amount of items, firstn will give ( for instance ) [1,2,4] instead of [1,2,3,4] and the replicated pool code will gracefully handle this. If using indep the result will be [1,2,CRUSH_ITEM_NONE,4] and will probably assert somewhere.

Here is an example from the test suite run when you make check :
https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/bad-mappings.t
where 2147483647 == CRUSH_ITEM_NONE

I don't know of an other reason preventing the use of indep for replicated pools.

Cheers

> 
> Sincerely, Yuan
> 
> 
> On Mon, Jan 13, 2014 at 7:37 AM, Loic Dachary <loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx>> wrote:
> 
> 
> 
>     On 12/01/2014 15:55, Dietmar Maurer wrote:
>     > From the docs:
>     >
>     >
>     >
>     > step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
>     >
>     >
>     >
>     > What exactly is the difference between ‘firstn’ and ‘indep’?
>     >
>     Hi,
> 
>     For Ceph releases up to Emperor[1], firstn is used and I'm not aware of a use case requiring indep. As part of the effort to implement erasure coded pools, firstn[2] and indep[3] were separated in two functions. The firstn method is best suited for replicated pools. The indep method tries to minimize the position changes in case an OSD becomes unavailable. For instance, if indep finds
> 
>       [1,2,3,4]
> 
>     and after a while 3 become unavailable, it is very likely to replace it with
> 
>       [1,2,5,4]
> 
>     It matters to erasure coded pools because
> 
>       [4,5,2,1]
> 
>     (i.e. the same OSDs but in different positions), implies more I/O. Another difference is that in the case of a mapping failure (i.e. unable to find the required number of OSDs), firstn will return a short list ( for instance [1,2,3] when 4 are required ) and indep will return a list with a placeholder at the missing position ( for instance [1,2,CRUSH_ITEM_NONE,4] ).
> 
>     Cheers
> 
>     [1] implementation in releases up to Emperor https://github.com/ceph/ceph/blob/v0.72/src/crush/mapper.c#L295
>     [2] firstn https://github.com/ceph/ceph/blob/v0.74/src/crush/mapper.c#L295
>     [3] indep https://github.com/ceph/ceph/blob/v0.74/src/crush/mapper.c#L459
> 
>     --
>     Loïc Dachary, Artisan Logiciel Libre
> 
> 
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux