> Ver. 0.56.6 > Hmm, osd not died, 1 or more pg stack on peereng on it. Can you get a pgid from 'ceph health detail' and then do 'ceph pg <pgid> query' and attach that output? Thanks! sage > > Regards > Dominik > > On Jun 28, 2013 11:28 PM, "Sage Weil" <sage@xxxxxxxxxxx> wrote: > On Sat, 29 Jun 2013, Andrey Korolyov wrote: > > There is almost same problem with the 0.61 cluster, at least > with same > > symptoms. Could be reproduced quite easily - remove an osd and > then > > mark it as out and with quite high probability one of > neighbors will > > be stuck at the end of peering process with couple of peering > pgs with > > primary copy on it. Such osd process seems to be stuck in some > kind of > > lock, eating exactly 100% of one core. > > Which version? > Can you attach with gdb and get a backtrace to see what it is > chewing on? > > Thanks! > sage > > > > > > On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum > <greg@xxxxxxxxxxx> wrote: > > > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron > <szibis@xxxxxxxxx> wrote: > > >> Hi, sorry for late response. > > >> > > >> > https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view > > >> > > >> Logs in attachment, and on google drive, from today. > > >> > > >> > https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view > > >> > > >> We have such problem today. And new logs are on google > drive with today date. > > >> > > >> Strange is that problematic osd.71 have about 10-15%, more > space used > > >> then other osd in cluster. > > >> > > >> Today in one hour osd.71 fails 3 times in mon log, and > after third > > >> recovery has been stuck, and many 500 errors appears in > http layer on > > >> top of rgw. When it's stuck, restarting osd71, osd.23, and > osd.108, > > >> all from stucked pg, helps, but i run even repair on this > osd, just in > > >> case. > > >> > > >> I have some theory, that on this pg is rgw index of > objects, or one of > > >> osd in this pg, have some problems with local filesystem or > drive > > >> bellow (raid controller reports nothing about that), but i > do not see > > >> any problem in system. > > >> > > >> How can we find in which pg/osd index of objects in rgw > bucket exist ?? > > > > > > You can find the location of any named object by grabbing > the OSD map > > > from the cluster and using the osdmaptool: "osdmaptool > <mapfile> > > > --test-map-object <objname> --pool <poolid>". > > > > > > You're not providing any context for your issue though, so > we really > > > can't help. What symptoms are you observing? > > > -Greg > > > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com