Hi, We took osd.71 out and now problem is on osd.57. Something curious, op_rw on osd.57 is much higher than other. See here: https://www.dropbox.com/s/o5q0xi9wbvpwyiz/op_rw_osd57.PNG On data on this osd I found: > data/osd.57/current# du -sh omap/ > 2.3G omap/ That much higher op_rw on one osd is normal? Maybe some config is wrong set ( logs to this osd or something that ). Today we have anogher crash ( 4 times ). Logs with debug (level 10) here: https://www.dropbox.com/s/vxvh8084b8ty19u/osd.57_20130628_13xx.log.tar.gz When we debug on is higher but normal osd.57 proces consumes about ~7%CPU, iostat on disk shows max 4%util. Prod ceph.conf debug options: [global] debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_mds = 0/0 debug_mds_balancer = 0/0 debug_mds_locker =0/0 debug_mds_log = 0/0 debug_mds_log_expire = 0/0 debug_mds_migrator = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_journaler = 0/0 debug_objectcacher = 0/0 debug_client = 0/0 debug_optracker = 0/0 debug_objclass = 0/0 debug_journal = 0/0 debug_ms = 0/0 debug_mon = 0/0 debug_monc = 0/0 debug_paxos = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_hadoop = 0/0 debug_asok = 0/0 debug_throttle = 0/0 [osd] debug osd = 1 debug filestore = 1 ; local object storage -- Regards Dominik 2013/6/13 Gregory Farnum <greg@xxxxxxxxxxx>: > On Thu, Jun 13, 2013 at 6:33 AM, Sławomir Skowron <szibis@xxxxxxxxx> wrote: >> Hi, sorry for late response. >> >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view >> >> Logs in attachment, and on google drive, from today. >> >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view >> >> We have such problem today. And new logs are on google drive with today date. >> >> Strange is that problematic osd.71 have about 10-15%, more space used >> then other osd in cluster. >> >> Today in one hour osd.71 fails 3 times in mon log, and after third >> recovery has been stuck, and many 500 errors appears in http layer on >> top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108, >> all from stucked pg, helps, but i run even repair on this osd, just in >> case. >> >> I have some theory, that on this pg is rgw index of objects, or one of >> osd in this pg, have some problems with local filesystem or drive >> bellow (raid controller reports nothing about that), but i do not see >> any problem in system. >> >> How can we find in which pg/osd index of objects in rgw bucket exist ?? > > You can find the location of any named object by grabbing the OSD map > from the cluster and using the osdmaptool: "osdmaptool <mapfile> > --test-map-object <objname> --pool <poolid>". > > You're not providing any context for your issue though, so we really > can't help. What symptoms are you observing? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com -- Pozdrawiam Dominik _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com