Re: PGs stuck stale during data migration and OSD restart

Gerd Jakobovitsch <gerd@xxxxxxxxxxxxx> · Mon, 31 Aug 2015 18:42:41 -0300

    I tried pg query, but it doesn't return, it hungs forever. As I
    understand it, when the PG is stale, there is no OSD to get the
    query. Am I right?

    I did the tunables in 2 steps, but didn't wait for all the data
    being moved before doing the second step.

    I rolled back to intermediate tunables - undefining the optimization
    below:

    chooseleaf_descend_once:
    Whether a recursive chooseleaf attempt
    will retry, or only try once and allow the original placement to
    retry. Legacy default is 0, optimal value is 1.

    Doing so, the stale OSDs imediately disappeared. Since I rolled
    back, I can't give you the outcome of ceph -s.

    I believe some of the issue is related to a under-dimensioned
    hardware. The OSDs are being killed by watchdog, my memory is
    swapping. But even so I didn't expect to lose data mapping.

    Regards.

    Em 31-08-2015 05:48, Gregory Farnum
      escreveu:

      On Sat, Aug 29, 2015 at 11:50 AM, Gerd Jakobovitsch <gerd@xxxxxxxxxxxxx> wrote:

        Dear all,

During a cluster reconfiguration (change of crush tunables from legacy to
TUNABLES2) with large data replacement, several OSDs get overloaded and had
to be restarted; when OSDs stabilize, I got a number of PGs marked stale,
even when all OSDs where this data used to be located show up again.

When I look at the OSDs current directory for the last placement, there is
still some data. But it never shows up again.

Is there any way to force these OSDs to resume being used?

      This sounds very strange. Can you provide the output of "ceph -s" and
run pg query against one of the stuck PGs?
-Greg

    -- 

--

As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com