Am 17.05.2017 um 21:21 schrieb Jason Dillaman: > Any chance you still have debug logs enabled on OSD 23 after you > restarted it and the scrub froze again? No but i can do that ;-) Hopefully it freezes again. Stefan > > On Wed, May 17, 2017 at 3:19 PM, Stefan Priebe - Profihost AG > <s.priebe@xxxxxxxxxxxx> wrote: >> Hello, >> >> now it shows again: >>>> 4095 active+clean >>>> 1 active+clean+scrubbing >> >> and: >> # ceph pg dump | grep -i scrub >> dumped all in format plain >> pg_stat objects mip degr misp unf bytes log disklog >> state state_stamp v reported up up_primary >> acting acting_primary last_scrub scrub_stamp last_deep_scrub >> deep_scrub_stamp >> 2.aa 4040 0 0 0 0 10128667136 3010 >> 3010 active+clean+scrubbing 2017-05-11 09:37:37.962700 >> 181936'11196478 181936:8688051 [23,41,9] 23 [23,41,9] >> 23 176730'10793226 2017-05-10 03:43:20.849784 171715'10548192 >> 2017-05-04 14:27:39.210713 >> >> So it seems the same scrub is stuck again... even after restarting the >> osd. It just took some time until the scrub of this pg happened again. >> >> Greets, >> Stefan >> Am 17.05.2017 um 21:13 schrieb Jason Dillaman: >>> Can you share your current OSD configuration? It's very curious that >>> your scrub is getting randomly stuck on a few objects for hours at a >>> time until an OSD is reset. >>> >>> On Wed, May 17, 2017 at 2:55 PM, Stefan Priebe - Profihost AG >>> <s.priebe@xxxxxxxxxxxx> wrote: >>>> Hello Jason, >>>> >>>> minutes ago i had another case where i restarted the osd which was shown >>>> in objecter_requests output. >>>> >>>> It seems also other scrubs and deep scrubs were hanging. >>>> >>>> Output before: >>>> 4095 active+clean >>>> 1 active+clean+scrubbing >>>> >>>> Output after restart: >>>> 4084 active+clean >>>> 7 active+clean+scrubbing+deep >>>> 5 active+clean+scrubbing >>>> >>>> both values are changing every few seconds again doing a lot of scrubs >>>> and deep scubs. >>>> >>>> Greets, >>>> Stefan >>>> Am 17.05.2017 um 20:36 schrieb Stefan Priebe - Profihost AG: >>>>> Hi, >>>>> >>>>> that command does not exist. >>>>> >>>>> But at least ceph -s permanently reports 1 pg in scrubbing with no change. >>>>> >>>>> Log attached as well. >>>>> >>>>> Greets, >>>>> Stefan >>>>> Am 17.05.2017 um 20:20 schrieb Jason Dillaman: >>>>>> Does your ceph status show pg 2.cebed0aa (still) scrubbing? Sure -- I >>>>>> can quickly scan the new log if you directly send it to me. >>>>>> >>>>>> On Wed, May 17, 2017 at 2:18 PM, Stefan Priebe - Profihost AG >>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>> can send the osd log - if you want? >>>>>>> >>>>>>> Stefan >>>>>>> >>>>>>> Am 17.05.2017 um 20:13 schrieb Stefan Priebe - Profihost AG: >>>>>>>> Hello Jason, >>>>>>>> >>>>>>>> the command >>>>>>>> # rados -p cephstor6 rm rbd_data.21aafa6b8b4567.0000000000000aaa >>>>>>>> >>>>>>>> hangs as well. Doing absolutely nothing... waiting forever. >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 17.05.2017 um 17:05 schrieb Jason Dillaman: >>>>>>>>> OSD 23 notes that object rbd_data.21aafa6b8b4567.0000000000000aaa is >>>>>>>>> waiting for a scrub. What happens if you run "rados -p <rbd pool> rm >>>>>>>>> rbd_data.21aafa6b8b4567.0000000000000aaa" (capturing the OSD 23 logs >>>>>>>>> during this)? If that succeeds while your VM remains blocked on that >>>>>>>>> remove op, it looks like there is some problem in the OSD where ops >>>>>>>>> queued on a scrub are not properly awoken when the scrub completes. >>>>>>>>> >>>>>>>>> On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG >>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>>> Hello Jason, >>>>>>>>>> >>>>>>>>>> after enabling the log and generating a gcore dump, the request was >>>>>>>>>> successful ;-( >>>>>>>>>> >>>>>>>>>> So the log only contains the successfull request. So i was only able to >>>>>>>>>> catch the successful request. I can send you the log on request. >>>>>>>>>> >>>>>>>>>> Luckily i had another VM on another Cluster behaving the same. >>>>>>>>>> >>>>>>>>>> This time osd.23: >>>>>>>>>> # ceph --admin-daemon >>>>>>>>>> /var/run/ceph/ceph-client.admin.22969.140085040783360.asok >>>>>>>>>> objecter_requests >>>>>>>>>> { >>>>>>>>>> "ops": [ >>>>>>>>>> { >>>>>>>>>> "tid": 18777, >>>>>>>>>> "pg": "2.cebed0aa", >>>>>>>>>> "osd": 23, >>>>>>>>>> "object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa", >>>>>>>>>> "object_locator": "@2", >>>>>>>>>> "target_object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa", >>>>>>>>>> "target_object_locator": "@2", >>>>>>>>>> "paused": 0, >>>>>>>>>> "used_replica": 0, >>>>>>>>>> "precalc_pgid": 0, >>>>>>>>>> "last_sent": "1.83513e+06s", >>>>>>>>>> "attempts": 1, >>>>>>>>>> "snapid": "head", >>>>>>>>>> "snap_context": "28a43=[]", >>>>>>>>>> "mtime": "2017-05-17 16:51:06.0.455475s", >>>>>>>>>> "osd_ops": [ >>>>>>>>>> "delete" >>>>>>>>>> ] >>>>>>>>>> } >>>>>>>>>> ], >>>>>>>>>> "linger_ops": [ >>>>>>>>>> { >>>>>>>>>> "linger_id": 1, >>>>>>>>>> "pg": "2.f0709c34", >>>>>>>>>> "osd": 23, >>>>>>>>>> "object_id": "rbd_header.21aafa6b8b4567", >>>>>>>>>> "object_locator": "@2", >>>>>>>>>> "target_object_id": "rbd_header.21aafa6b8b4567", >>>>>>>>>> "target_object_locator": "@2", >>>>>>>>>> "paused": 0, >>>>>>>>>> "used_replica": 0, >>>>>>>>>> "precalc_pgid": 0, >>>>>>>>>> "snapid": "head", >>>>>>>>>> "registered": "1" >>>>>>>>>> } >>>>>>>>>> ], >>>>>>>>>> "pool_ops": [], >>>>>>>>>> "pool_stat_ops": [], >>>>>>>>>> "statfs_ops": [], >>>>>>>>>> "command_ops": [] >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> OSD Logfile of OSD 23 attached. >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> Am 17.05.2017 um 16:26 schrieb Jason Dillaman: >>>>>>>>>>> On Wed, May 17, 2017 at 10:21 AM, Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>>>>> You mean the request no matter if it is successful or not? Which log >>>>>>>>>>>> level should be set to 20? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm hoping you can re-create the hung remove op when OSD logging is >>>>>>>>>>> increased -- "debug osd = 20" would be nice if you can turn it up that >>>>>>>>>>> high while attempting to capture the blocked op. >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>> >>>>>> >>> >>> >>> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com