I tried pg query, but it doesn't return, it hungs forever. As I
understand it, when the PG is stale, there is no OSD to get the
query. Am I right? I did the tunables in 2 steps, but didn't wait for all the data being moved before doing the second step. I rolled back to intermediate tunables - undefining the optimization below: chooseleaf_descend_once: Whether a recursive chooseleaf attempt will retry, or only try once and allow the original placement to retry. Legacy default is 0, optimal value is 1. Doing so, the stale OSDs imediately disappeared. Since I rolled back, I can't give you the outcome of ceph -s. I believe some of the issue is related to a under-dimensioned hardware. The OSDs are being killed by watchdog, my memory is swapping. But even so I didn't expect to lose data mapping. Regards. Em 31-08-2015 05:48, Gregory Farnum
escreveu:
On Sat, Aug 29, 2015 at 11:50 AM, Gerd Jakobovitsch <gerd@xxxxxxxxxxxxx> wrote:Dear all, During a cluster reconfiguration (change of crush tunables from legacy to TUNABLES2) with large data replacement, several OSDs get overloaded and had to be restarted; when OSDs stabilize, I got a number of PGs marked stale, even when all OSDs where this data used to be located show up again. When I look at the OSDs current directory for the last placement, there is still some data. But it never shows up again. Is there any way to force these OSDs to resume being used?This sounds very strange. Can you provide the output of "ceph -s" and run pg query against one of the stuck PGs? -Greg --
-- |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com