We have a new ceph Nautilus setup (Nautilus from scratch - not upgraded):
# ceph versions
{
"mon": {
"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nautilus (stable)": 169
},
"mds": {},
"overall": {
"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nautilus (stable)": 175
}
}
We only have CephFS on it with the two required triple replicated
pools. After creating the pools, we've increased the PG numbers on
them, but did not turn autoscaling on:
# ceph osd pool ls detail
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode warn
last_change 2017 lfor 0/0/886 flags hashpspool stripe_width 0
application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn
last_change 1995 flags hashpspool stripe_width 0 pg_autoscale_bias 4
pg_num_min 16 recovery_priority 5 application cephfs
After a few days of running, we started seeing inconsistent placement
groups:
# ceph pg dump | grep incons
dumped all
1.bb4 21 0 0 0 0
67108864 0 0 3059 3059 active+clean+inconsistent
2019-10-20 11:13:43.270022 5830'5655 5831:18901 [346,426,373] 346
[346,426,373] 346 5830'5655 2019-10-20 11:13:43.269992
1763'5424 2019-10-18 08:14:37.582180 0
1.795 29 0 0 0 0
96468992 0 0 3081 3081 active+clean+inconsistent
2019-10-20 17:06:45.876483 5830'5472 5831:17921 [468,384,403] 468
[468,384,403] 468 5830'5472 2019-10-20 17:06:45.876455
1763'5235 2019-10-18 08:16:07.166754 0
1.1fa 18 0 0 0 0
33554432 0 0 3065 3065 active+clean+inconsistent
2019-10-20 15:35:29.755622 5830'5268 5831:17139 [337,401,455] 337
[337,401,455] 337 5830'5268 2019-10-20 15:35:29.755588
1763'5084 2019-10-18 08:17:17.962888 0
1.579 26 0 0 0 0
75497472 0 0 3068 3068 active+clean+inconsistent
2019-10-20 21:45:42.914200 5830'5218 5831:15405 [477,364,332] 477
[477,364,332] 477 5830'5218 2019-10-20 21:45:42.914173
5830'5218 2019-10-19 12:13:53.259686 0
1.11c5 21 0 0 0 0
71303168 0 0 3010 3010 active+clean+inconsistent
2019-10-20 23:31:36.183053 5831'5183 5831:16214 [458,370,416] 458
[458,370,416] 458 5831'5183 2019-10-20 23:31:36.183030
5831'5183 2019-10-19 16:35:17.195721 0
1.128d 17 0 0 0 0
46137344 0 0 3073 3073 active+clean+inconsistent
2019-10-20 19:14:55.459236 5830'5368 5831:17584 [441,422,377] 441
[441,422,377] 441 5830'5368 2019-10-20 19:14:55.459209
1763'5110 2019-10-18 08:12:51.062548 0
1.19ef 16 0 0 0 0
41943040 0 0 3076 3076 active+clean+inconsistent
2019-10-20 23:33:02.020050 5830'5502 5831:18244 [323,431,439] 323
[323,431,439] 323 5830'5502 2019-10-20 23:33:02.020025
1763'5220 2019-10-18 08:12:51.117020 0
The logs look like this (the 1.bb4 PG for example):
2019-10-20 11:13:43.261 7fffd3633700 0 log_channel(cluster) log [DBG] :
1.bb4 scrub starts
2019-10-20 11:13:43.265 7fffd3633700 -1 log_channel(cluster) log [ERR] :
1.bb4 scrub : stat mismatch, got 21/21 objects, 0/0 clones, 21/21 dirty,
0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts,
88080384/67108864 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-10-20 11:13:43.265 7fffd3633700 -1 log_channel(cluster) log [ERR] :
1.bb4 scrub 1 errors
It looks like doing a pg repair fixes the issue:
2019-10-21 09:17:50.125 7fffd3633700 0 log_channel(cluster) log [DBG] :
1.bb4 repair starts
2019-10-21 09:17:50.653 7fffd3633700 -1 log_channel(cluster) log [ERR] :
1.bb4 repair : stat mismatch, got 21/21 objects, 0/0 clones, 21/21
dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts,
88080384/67108864 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-10-21 09:17:50.653 7fffd3633700 -1 log_channel(cluster) log [ERR] :
1.bb4 repair 1 errors, 1 fixed
Is this a known issue with Nautilus? We have other Luminous/Mimic
clusters, where I haven't seen this come up.
Thanks,
Andras
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com