2012/3/5 Sławomir Skowron <szibis@xxxxxxxxx>: > On 5 mar 2012, at 19:59, Yehuda Sadeh Weinraub > <yehuda.sadeh@xxxxxxxxxxxxx> wrote: > >> On Mon, Mar 5, 2012 at 2:23 AM, Sławomir Skowron >> <slawomir.skowron@xxxxxxxxx> wrote: >>> 2012/3/1 Sławomir Skowron <slawomir.skowron@xxxxxxxxx>: >>>> 2012/2/29 Yehuda Sadeh Weinraub <yehuda.sadeh@xxxxxxxxxxxxx>: >>>>> On Wed, Feb 29, 2012 at 5:06 AM, Sławomir Skowron >>>>> <slawomir.skowron@xxxxxxxxx> wrote: >>>>>> >>>>>> Ok, it's intentional. >>>>>> >>>>>> We are checking meta info about files, then, checking md5 of file >>>>>> content. In parallel, updating object that have change, and then >>>>>> archiving this objects in another key, and last thing is deleting >>>>>> objects that expires. >>>>>> >>>>>> This happens over and over, because, this site is changing many times. >>>>>> >>>>>> Now i don't have any idea, how to workaround this problem, without >>>>>> shutdown this app :( >>>>> >>>>> I looked at your osd log again, and there are other things that don't >>>>> look right. I'll also need you to turn on 'debug osd = 20' and 'debug >>>>> filestore = 20'. >>>> >>>> osd.24 almost 10 minutes of log in debug, as above in attachment. >>>> >>>>> Other than that, I just pushed a workaround that might improve things. >>>>> It's on the wip-rgw-atomic-no-retry branch on github (based on >>>>> 0.42.2), so you might want to give it a spin and let us know whether >>>>> it actually improved things. >>>> >>>> Ok i will try, and let you know soon. >>> >>> Unfortunately, no improvment after upgrade for this version. >>> >> It looks like an issue with updating the bucket index, but I'm having >> trouble confirming it, as the log provided (of osd.24) doesn't contain >> any relevant operations. If you could provide a log from the relevant >> osd it may be very helpful. >> >> You can find the relevant osd by looking at an operation that took too >> long, and look for a request like the following: >> >> 2012-02-28 20:20:10.944859 7fb1affb7700 -- 10.177.64.6:0/1020439 --> >> 10.177.64.4:6839/7954 -- osd_op(client.65007.0:587 .dir.3 [call >> rgw.bucket_prepare_op] 7.ccb26a35) v4 -- ?+0 0xf25270 con 0xbcd1c0 >> >> It would be easiest looking for the reply to that request as it will >> contain the osd id (search for a line that contains osd_op_reply and >> the client.65007.0:587 request id). >> >> In the mean time, I created issue #2139 for a probable culprit. Having >> the relevant logs will allow us to verify whether you're hitting that >> or another issue. >> >> Thanks, >> Yehuda > > Ok, because of time difference between as i will try too find this on > the morning in job. If there will be insufficient verbosity of logs i > will try too start all OSD in debug, as you write earlier, and then > generate, the problem again. > I try to send logs as soon as possible. > > Regards > Slawomir Skowron All logs from osd.24, osd.62, and osd.36 with osd debug =20 and filestore debug = 20 from 2012-03-06 10:25 and more. http://217.144.195.170/ceph/osd.24.log.tar.gz (348MB) - machine 1 - rack1 http://217.144.195.170/ceph/osd.36.log.tar.gz (26MB) - machine 2 - rack 2 http://217.144.195.170/ceph/osd.62.log.tar.gz (23MB) - machine3 - rack 3 -- ----- Pozdrawiam Sławek "sZiBis" Skowron -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html