Hi, looks like one of your OSDs has been marked as out. Just make sure it’s in so you can read '67 osds: 67 up, 67 in' rather than '67 osds: 67 up, 66 in’ in the ‘ceph -s’ output You can quickly check which one is not in with the ‘ceph old tree’ command JC > On Apr 12, 2016, at 11:21, Joao Eduardo Luis <joao@xxxxxxx> wrote: > > On 04/12/2016 07:16 PM, Eric Hall wrote: >> Removed mon on mon1, added mon on mon1 via ceph-deply. mons now have >> quorum. >> >> I am left with: >> cluster 5ee52b50-838e-44c4-be3c-fc596dc46f4e >> health HEALTH_WARN 1086 pgs peering; 1086 pgs stuck inactive; 1086 >> pgs stuck unclean; pool vms has too few pgs >> monmap e5: 3 mons at >> {cephsecurestore1=172.16.250.7:6789/0,cephsecurestore2=172.16.250.8:6789/0,cephsecurestore3=172.16.250.9:6789/0}, >> election epoch 28, quorum 0,1,2 >> cephsecurestore1,cephsecurestore2,cephsecurestore3 >> mdsmap e2: 0/0/1 up >> osdmap e38769: 67 osds: 67 up, 66 in >> pgmap v33886066: 7688 pgs, 24 pools, 4326 GB data, 892 kobjects >> 11620 GB used, 8873 GB / 20493 GB avail >> 3 active+clean+scrubbing+deep >> 1086 peering >> 6599 active+clean >> >> All OSDs are up/in as reported. But I see no recovery I/O for those in >> inactive/peering/unclean. > > Someone else will probably be able to chime in with more authority than me, but I would first try to restart the osds to which those stuck pgs are being mapped. > > -Joao > >> >> Thanks, >> -- >> Eric >> >> On 4/12/16 1:14 PM, Joao Eduardo Luis wrote: >>> On 04/12/2016 06:38 PM, Eric Hall wrote: >>>> Ok, mon2 and mon3 are happy together, but mon1 dies with >>>> mon/MonitorDBStore.h: 287: FAILED assert(0 == "failed to write to db") >>>> >>>> I take this to mean mon1:store.db is corrupt as I see no permission >>>> issues. >>>> >>>> So... remove mon1 and add a mon? >>>> >>>> Nothing special to worry about re-adding a mon on mon1, other than rm/mv >>>> the current store.db path, correct? >>> >>> You'll actually need to recreate the mon with 'ceph-mon --mkfs' for that >>> to work, and that will likely require you to rm/mv the mon data >>> directory. >>> >>> You *could* copy the mon dir from one of the other monitors and use that >>> instead. But given you have a functioning quorum, I don't think there's >>> any reason to resort to that. >>> >>> Follow the docs on removing monitors[1] and recreate the monitor from >>> scratch, adding it to the cluster. It will sync up from scratch from the >>> other monitors. That'll make them happy. >>> >>> -Joao >>> >>> [1] >>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors >>> >>> >>> >>>> >>>> Thanks again, >>>> -- >>>> Eric >>>> >>>> On 4/12/16 11:18 AM, Joao Eduardo Luis wrote: >>>>> On 04/12/2016 05:06 PM, Joao Eduardo Luis wrote: >>>>>> On 04/12/2016 04:27 PM, Eric Hall wrote: >>>>>>> On 4/12/16 9:53 AM, Joao Eduardo Luis wrote: >>>>>>> >>>>>>>> So this looks like the monitors didn't remove version 1, but this >>>>>>>> may >>>>>>>> just be a red herring. >>>>>>>> >>>>>>>> What matters, really, is the values in 'first_committed' and >>>>>>>> 'last_committed'. If either first or last_committed happens to be >>>>>>>> '1', >>>>>>>> then there may be a bug somewhere in the code, but I doubt that. >>>>>>>> This >>>>>>>> seems just an artefact. >>>>>>>> >>>>>>>> So, it would be nice if you could provide the value of both >>>>>>>> 'osdmap:first_committed' and 'osdmap:last_committed'. >>>>>>> >>>>>>> mon1: >>>>>>> (osdmap, last_committed) >>>>>>> 0000 : 01 00 00 00 00 00 00 00 : ........ >>>>>>> (osdmap, fist_committed) does not exist >>>>>>> >>>>>>> mon2: >>>>>>> (osdmap, last_committed) >>>>>>> 0000 : 01 00 00 00 00 00 00 00 : ........ >>>>>>> (osdmap, fist_committed) does not exist >>>>>>> >>>>>>> mon3: >>>>>>> (osdmap, last_committed) >>>>>>> 0000 : 01 00 00 00 00 00 00 00 : ........ >>>>>>> (osdmap, first_committed) >>>>>>> 0000 : b8 94 00 00 00 00 00 00 >>>>>> >>>>>> Wow! This is unexpected, but fits the assertion just fine. >>>>>> >>>>>> The solution, I think, will be rewriting first_committed and >>>>>> last_committed on all monitors - except on mon1. >>>>> >>>>> Let me clarify this a bit: the easy way out for mon1 would be to fix >>>>> the >>>>> other two monitors and recreate mon1. >>>>> >>>>> If you prefer to also fix mon1, you can simply follow the same steps on >>>>> the previous email for all the monitors, but ensuring >>>>> osdmap:full_latest >>>>> on mon1 reflects the last available full_XXXX version on its store. >>>>> >>>>> -Joao >>> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com