On Monday, January 7, 2013 at 9:25 AM, Denis Fondras wrote: > Hello all, > > > I'm using Ceph 0.55.1 on a Debian Wheezy (1 mon, 1 mds et 3 osd over > > btrfs) and every once in a while, an OSD process crashes (almost never > > the same osd crashes). > > This time I had 2 osd crash in a row and so I only had one replicate. I > > could bring the 2 crashed osd up and it started to recover. > > Unfortunately, the "source" osd crashed while recovering and now I have > > a some lost PGs. > > > > If I happen to bring the primary OSD up again, can I imagine the lost PG > > will be recovered too ? > > > > Ok, so it seems I can't bring back to life my primary OSD :-( > > ---8<--------------- > health HEALTH_WARN 72 pgs incomplete; 72 pgs stuck inactive; 72 pgs > stuck unclean > monmap e1: 1 mons at {a=192.168.0.132:6789/0}, election epoch 1, quorum 0 a > osdmap e1130: 3 osds: 2 up, 2 in > pgmap v1567492: 624 pgs: 552 active+clean, 72 incomplete; 1633 GB > data, 4766 GB used, 3297 GB / 8383 GB avail > mdsmap e127: 1/1/1 up {0=a=up:active} > > 2013-01-07 18:11:10.852673 mon.0 [INF] pgmap v1567492: 624 pgs: 552 > active+clean, 72 incomplete; 1633 GB data, 4766 GB used, 3297 GB / 8383 > GB avail > ---8<--------------- > > When I "rbd list", I can see all my images. > When I do "rbd map", I can map only a few of them and when I mount the > devices, none can mount (the mount process hangs and I cannot even ^C > the process). > > Is there something I can try ? What's wrong with your primary OSD? In general they shouldn't really be crashing that frequently and if you've got a new bug we'd like to diagnose and fix it. If that can't be done (or it's a hardware failure or something), you can mark the OSD lost, but that might lose data and then you will be sad. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html