this was a bit weird, but is now working... Writing for future reference if someone faces the same issue. this cluster was upgraded from jewel to luminous following the recommended process. When it was finished I just set the require_osd to luminous. However I hadn't restarted the daemons since. So just restarting all the OSDs made the problem go away. How to check if that was the case? The OSDs now have a "class" associated. On Wed, Jan 10, 2018 at 7:16 PM, Luis Periquito <periquito@xxxxxxxxx> wrote: > Hi, > > I'm running a cluster with 12.2.1 and adding more OSDs to it. > Everything is running version 12.2.1 and require_osd is set to > luminous. > > one of the pools is replicated with size 2 min_size 1, and is > seemingly blocking IO while recovering. I have no slow requests, > looking at the output of "ceph osd perf" it seems brilliant (all > numbers are lower than 10). > > clients are RBD (OpenStack VM in KVM) and using (mostly) 10.2.7. I've > tagged those OSDs as out and the RBD just came back to life. I did > have some objects degraded: > > 2018-01-10 18:23:52.081957 mon.mon0 mon.0 x.x.x.x:6789/0 410414 : > cluster [WRN] Health check update: 9926354/49526500 objects misplaced > (20.043%) (OBJECT_MISPLACED) > 2018-01-10 18:23:52.081969 mon.mon0 mon.0 x.x.x.x:6789/0 410415 : > cluster [WRN] Health check update: Degraded data redundancy: > 5027/49526500 objects degraded (0.010%), 1761 pgs unclean, 27 pgs > degraded (PG_DEGRADED) > > any thoughts as to what might be happening? I've run such operations > many a times... > > thanks for all help, as I'm grasping as to figure out what's happening... _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com