Hello,
I'll start by explaining what I have done. I was adding some new storage
in attempt to setup a cache pool according to
https://docs.ceph.com/en/latest/dev/cache-pool/ by doing the following.
1. I upgraded all servers in cluster to ceph 15.2.14 which put the
system into recovery for out of sync data.
2. I added 2 SSDs as OSDs to the cluster which immediately cause ceph to
balance onto the SSDs.
3. I added 2 new crush rules which map to SSD storage vs HDD storage.
4. I assigned my existing VM pool to the HDD storage ruleset.
5. I added a new pool for the cache tier.
6. As adding the new pool seemed to be stuck, and I got an alert about
PG 9.0 being in unknown state and the main storage pool has become
inaccessible, I decided to reboot my servers incase it was a small issue
that can be resolved by a reboot. After reboot, more PGs has gone into
unknown state.
7. I reviewed
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/
and checked ceph pg dump_stuck as follows:
# cephpgdump_stuck
ok
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
7.1e unknown [] -1 [] -1
7.1f unknown [] -1 [] -1
7.1c unknown [] -1 [] -1
7.1d unknown [] -1 [] -1
7.12 unknown [] -1 [] -1
7.13 unknown [] -1 [] -1
7.10 unknown [] -1 [] -1
7.11 unknown [] -1 [] -1
7.16 unknown [] -1 [] -1
7.15 unknown [] -1 [] -1
7.a unknown [] -1 [] -1
7.b unknown [] -1 [] -1
7.8 unknown [] -1 [] -1
7.9 unknown [] -1 [] -1
7.4 unknown [] -1 [] -1
7.19 unknown [] -1 [] -1
7.3 unknown [] -1 [] -1
7.e unknown [] -1 [] -1
7.f unknown [] -1 [] -1
7.c unknown [] -1 [] -1
7.d unknown [] -1 [] -1
7.0 unknown [] -1 [] -1
7.1 unknown [] -1 [] -1
7.1a unknown [] -1 [] -1
7.7 unknown [] -1 [] -1
7.18 unknown [] -1 [] -1
7.5 unknown [] -1 [] -1
8. I tried to use the pg query command as follows:
# cephpg7.0query
Couldn't parse JSON : Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
File "/usr/bin/ceph", line 1285, in <module>
retval = main()
File "/usr/bin/ceph", line 1204, in main
sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 836, in
parse_json_funcsigs
raise e
File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 833, in
parse_json_funcsigs
overall = json.loads(s)
File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
9. With the odd errors, I did a lot more research and found I could try
ceph osd force-create-pg which I have done, and after about an hour of
waiting there is no change in the state. If I check, I get the following:
# cephosdforce-create-pg7.0--yes-i-really-mean-it
pg 7.0 already creating
Any help in bringing the cluster back into a healthy state would be
appreciated.
Thanks,
James
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx