I have only 'ceph healht details' from previous crash. ceph health details HEALTH_WARN 6 pgs peering; 9 pgs stuck unclean pg 3.c62 is stuck unclean for 583.220063, current state active, last acting [57,23,51] pg 4.269 is stuck unclean for 4842.519837, current state peering, last acting [23,57,106] pg 3.26a is stuck unclean for 764.413502, current state peering, last acting [23,57,106] pg 3.556 is stuck unclean for 888.097879, current state peering, last acting [108,57,14] pg 4.555 is stuck unclean for 4842.518997, current state peering, last acting [108,57,14] pg 3.e59 is stuck unclean for 1036.717811, current state active, last acting [57,8,108] pg 3.78c is stuck unclean for 508.459454, current state peering, last acting [23,47,57] pg 4.54c is stuck unclean for 4842.365307, current state active, last acting [57,108,23] pg 3.ef0 is stuck unclean for 827.882363, current state active, last acting [57,23,117] pg 3.78c is peering, acting [23,47,57] pg 3.556 is peering, acting [108,57,14] pg 4.555 is peering, acting [108,57,14] pg 3.54d is peering, acting [57,108,23] pg 4.269 is peering, acting [23,57,106] pg 3.26a is peering, acting [23,57,106] ceph pg .. query from now for pg: https://www.dropbox.com/s/xhdga2qvgygecav/query_pgid.txt.tar.gz -- Regatds Dominik 2013/6/29 Sage Weil <sage@xxxxxxxxxxx>: >> Ver. 0.56.6 >> Hmm, osd not died, 1 or more pg stack on peereng on it. > > Can you get a pgid from 'ceph health detail' and then do 'ceph pg <pgid> > query' and attach that output? > > Thanks! > sage > >> >> Regards >> Dominik >> >> On Jun 28, 2013 11:28 PM, "Sage Weil" <sage@xxxxxxxxxxx> wrote: >> On Sat, 29 Jun 2013, Andrey Korolyov wrote: >> > There is almost same problem with the 0.61 cluster, at least >> with same >> > symptoms. Could be reproduced quite easily - remove an osd and >> then >> > mark it as out and with quite high probability one of >> neighbors will >> > be stuck at the end of peering process with couple of peering >> pgs with >> > primary copy on it. Such osd process seems to be stuck in some >> kind of >> > lock, eating exactly 100% of one core. >> >> Which version? >> Can you attach with gdb and get a backtrace to see what it is >> chewing on? >> >> Thanks! >> sage >> >> >> > >> > On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum >> <greg@xxxxxxxxxxx> wrote: >> > > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron >> <szibis@xxxxxxxxx> wrote: >> > >> Hi, sorry for late response. >> > >> >> > >> >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view >> > >> >> > >> Logs in attachment, and on google drive, from today. >> > >> >> > >> >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view >> > >> >> > >> We have such problem today. And new logs are on google >> drive with today date. >> > >> >> > >> Strange is that problematic osd.71 have about 10-15%, more >> space used >> > >> then other osd in cluster. >> > >> >> > >> Today in one hour osd.71 fails 3 times in mon log, and >> after third >> > >> recovery has been stuck, and many 500 errors appears in >> http layer on >> > >> top of rgw. When it's stuck, restarting osd71, osd.23, and >> osd.108, >> > >> all from stucked pg, helps, but i run even repair on this >> osd, just in >> > >> case. >> > >> >> > >> I have some theory, that on this pg is rgw index of >> objects, or one of >> > >> osd in this pg, have some problems with local filesystem or >> drive >> > >> bellow (raid controller reports nothing about that), but i >> do not see >> > >> any problem in system. >> > >> >> > >> How can we find in which pg/osd index of objects in rgw >> bucket exist ?? >> > > >> > > You can find the location of any named object by grabbing >> the OSD map >> > > from the cluster and using the osdmaptool: "osdmaptool >> <mapfile> >> > > --test-map-object <objname> --pool <poolid>". >> > > >> > > You're not providing any context for your issue though, so >> we really >> > > can't help. What symptoms are you observing? >> > > -Greg >> > > Software Engineer #42 @ http://inktank.com | http://ceph.com >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@xxxxxxxxxxxxxx >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- Pozdrawiam Dominik _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com