Re: Continuous OSD crash with kv backend (firefly)

Andrey Korolyov <andrey@xxxxxxx> · Sun, 26 Oct 2014 13:46:23 +0400

On Sun, Oct 26, 2014 at 7:40 AM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
> On Sun, Oct 26, 2014 at 3:12 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>> Thanks Haomai. Turns out that the master` recovery is too buggy right
>> now (recovery speed degrades over a time, OSD (non-kv) is going out of
>> cluster with no reason, misplaced object calculation is wrong and so
>> on), so I am sticking to giant with rocksdb now. So far no major
>> problems are revealed.
>
> Hmm, do you mean kvstore has problem on osd recovery? I'm eager to
> know the operations about how to produce this situation. Could you
> give more detail?
>
>
>
> --
> Best Regards,
>
> Wheat

I`m not sure if kv has triggered any of those, it`s just a side effect
of deploying master branch (and OSDs which showed problems was not in
kv subset only). Looks like both giant and master are exposing some
problem with pg recalculation on tight-IO conditions for MON (MONs are
sharing disk with one of OSD each and post-peering recalculation may
take some minutes when kv-based OSDs are involved, also recalculation
from active+remapped -> active+degraded(+...) takes tens of minutes;
the same 'non-optimal' setup worked well before and all recalculations
was made in a matter of tens of seconds, so I will investigate this a
bit later). Giant crashed on non-KV daemons during nightly recovery,
so there is a more critical stuff to fix right now because  kv so far
did not exposed any crashes by itself.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com