Dear Greg,
no, it's a very old cluster (continuous operation since 2013,
with multiple extensions). It's a production cluster and
there's about 300TB of valuable data on it.
We recently updated to luminous and added more OSDs (a month
ago or so), but everything seemed Ok since then. We didn't have
any disk failures, but we had trouble with the MDS daemons
in the last days, so there were a few reboots.
Is it somehow possible to find this "lost" PG again? Since
it's in the metadata pool, large parts of our CephFS directory
tree are currently unavailable. I turned the MDS daemons off
for now ...
Cheers
Oliver
On 14.06.2018 19:59, Gregory Farnum wrote:
Is this a new cluster? Or did the crush map change somehow recently? One
way this might happen is if CRUSH just failed entirely to map a pg,
although I think if the pg exists anywhere it should still be getting
reported as inactive.
On Thu, Jun 14, 2018 at 8:40 AM Oliver Schulz
<oliver.schulz@xxxxxxxxxxxxxx <mailto:oliver.schulz@xxxxxxxxxxxxxx>> wrote:
Dear all,
I have a serious problem with our Ceph cluster: One of our PGs somehow
ended up in this state (reported by "ceph health detail":
pg 1.XXX is stuck inactive for ..., current state unknown,
last acting []
Also, "ceph pg map 1.xxx" reports:
osdmap e525812 pg 1.721 (1.721) -> up [] acting []
I can't use "ceph pg 1.XXX query", it just hangs with no output.
All OSDs are up and in, I have MON quorum, all other PGs seem to be
fine.
How can diagnose/fix this? Unfortunately, the PG in question is part
of the CephFS metadata pool ...
Any help would be very, very much appreciated!
Cheers,
Oliver
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com