Hmm, can you post the monitor logs as well for that period? It looks like the osd is requesting a map change and not getting it. -Sam On Fri, Apr 6, 2012 at 10:59 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote: > Oops, was set to 600, chmod'd to 644, should be good now. > > On 6 April 2012 18:56, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote: >> Error 403. >> -Sam >> >> On Fri, Apr 6, 2012 at 10:53 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote: >>> Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted >>> the osd with debug osd = 20 and let it run for 5 minutes or so. >>> >>> On 6 April 2012 18:30, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote: >>>> Hmm, osd.0 should be enough. I need osd debugging at around 20 from >>>> when the osd started. Restarting the osd with debugging at 20 would >>>> also work fine. >>>> -Sam >>>> >>>> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote: >>>>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the >>>>> logs to all 3 of them? >>>>> >>>>> On 6 April 2012 17:50, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote: >>>>>> Is there a 0.138_head directory under current/ on any of your osds? >>>>>> If so, can you post the log to that osd? I could also use the osd.0 >>>>>> log. >>>>>> -Sam >>>>>> >>>>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@xxxxxxxxx> wrote: >>>>>>> I've uploaded them to: >>>>>>> >>>>>>> http://damoxc.net/ceph/osdmap >>>>>>> http://damoxc.net/ceph/pg_dump >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On 4 April 2012 21:51, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote: >>>>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ? >>>>>>>> You can get the osdmap via 'ceph osd getmap -o <filename>'. >>>>>>>> -Sam >>>>>>>> >>>>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm having some trouble getting some pgs to stop being inactive. The >>>>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x. >>>>>>>>> >>>>>>>>> ceph -s reports: >>>>>>>>> 2012-04-04 09:08:57.816029 pg v188540: 990 pgs: 223 inactive, 767 >>>>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail >>>>>>>>> 2012-04-04 09:08:57.817970 mds e2198: 1/1/1 up {0=node24=up:active}, >>>>>>>>> 4 up:standby >>>>>>>>> 2012-04-04 09:08:57.818024 osd e5910: 5 osds: 5 up, 5 in >>>>>>>>> 2012-04-04 09:08:57.818201 log 2012-04-04 09:04:03.838358 osd.3 >>>>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok >>>>>>>>> 2012-04-04 09:08:57.818280 mon e7: 3 mons at >>>>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0} >>>>>>>>> >>>>>>>>> ceph health says: >>>>>>>>> 2012-04-04 09:09:01.651053 mon <- [health] >>>>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck >>>>>>>>> inactive; 223 pgs stuck unclean' (0) >>>>>>>>> >>>>>>>>> I was wondering if anyone has any suggestions about how to resolve >>>>>>>>> this, or things to look for. I've tried restarted the ceph daemons on >>>>>>>>> the various nodes a few times to no-avail. I don't think that there is >>>>>>>>> anything wrong with any of the nodes either. >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> Damien >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html