Re: Primary OSD not as first shard in PG - LRC experiment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Sage, I'll look into that!

On Sat, Dec 23, 2017 at 9:09 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Sat, 23 Dec 2017, Oleg Kolosov wrote:
>> Hi
>> When ceph selects an OSD to act as primary, it is the first shard in
>> the PG (shard[0]). When working with LRC plugin, this constraint
>> greatly diminishes LRC's advantage. Ceph LRC plugin allows grouping of
>> OSD, so that recovery would occur between them (local groups).
>> However, since all recovery has to go through the primary OSD, in fact
>> the recovery isn't really local. Only in case the primary OSD is in
>> the same local group which contains the failed OSD the recovery is
>> local.
>>
>> I'm running LRC experiments on ceph and measure recovery by crashing a
>> specific OSD. In order to bypass this and make LRC truly effective in
>> my experiments, I was wondering if it is possible to manipulate the
>> choice of the primary OSD.
>>
>> For example, if I have a bucket which contains OSDs 0-10, and I always
>> kill one of them, is it possible to force the primary also to be OSD
>> 0-10 ?
>
> Yes.  This has been a longstanding todo item for LRC but we haven't gotten
> around to doing it.  I think what's needed here is a change to the
> choose_acting logic in PG.cc that allows the EC plugin to weight in on
> which primary it prefers.  Some care will be needed to make sure this
> choice is reevaluated at the appropriate times (e.g., when backfill
> completes).  (Or possibly it won't matter since generally speaking which
> shard is primary doesn't matter after recovery.)
>
> We'd also need to only consult the plugin after all OSDs in the shard have
> a feature bit indicating they do the same or else you can get into a loop
> where two OSDs keep giving primary back to each other.
>
> I think the place to start is to add a method to the EC interface that
> allows the plugin to suggest a primary (or not, if it has no opinion), and
> to implement one for LRC that does the right thing.  (Some unit tests here
> would be good!)  The change to the choose_acting code would come
> next--that'll be trickier to get right but we can help!
>
> Thanks-
> sage
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux