Any chance you still have debug logs enabled on OSD 23 after you restarted it and the scrub froze again? On Wed, May 17, 2017 at 3:19 PM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote: > Hello, > > now it shows again: >>> 4095 active+clean >>> 1 active+clean+scrubbing > > and: > # ceph pg dump | grep -i scrub > dumped all in format plain > pg_stat objects mip degr misp unf bytes log disklog > state state_stamp v reported up up_primary > acting acting_primary last_scrub scrub_stamp last_deep_scrub > deep_scrub_stamp > 2.aa 4040 0 0 0 0 10128667136 3010 > 3010 active+clean+scrubbing 2017-05-11 09:37:37.962700 > 181936'11196478 181936:8688051 [23,41,9] 23 [23,41,9] > 23 176730'10793226 2017-05-10 03:43:20.849784 171715'10548192 > 2017-05-04 14:27:39.210713 > > So it seems the same scrub is stuck again... even after restarting the > osd. It just took some time until the scrub of this pg happened again. > > Greets, > Stefan > Am 17.05.2017 um 21:13 schrieb Jason Dillaman: >> Can you share your current OSD configuration? It's very curious that >> your scrub is getting randomly stuck on a few objects for hours at a >> time until an OSD is reset. >> >> On Wed, May 17, 2017 at 2:55 PM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> Hello Jason, >>> >>> minutes ago i had another case where i restarted the osd which was shown >>> in objecter_requests output. >>> >>> It seems also other scrubs and deep scrubs were hanging. >>> >>> Output before: >>> 4095 active+clean >>> 1 active+clean+scrubbing >>> >>> Output after restart: >>> 4084 active+clean >>> 7 active+clean+scrubbing+deep >>> 5 active+clean+scrubbing >>> >>> both values are changing every few seconds again doing a lot of scrubs >>> and deep scubs. >>> >>> Greets, >>> Stefan >>> Am 17.05.2017 um 20:36 schrieb Stefan Priebe - Profihost AG: >>>> Hi, >>>> >>>> that command does not exist. >>>> >>>> But at least ceph -s permanently reports 1 pg in scrubbing with no change. >>>> >>>> Log attached as well. >>>> >>>> Greets, >>>> Stefan >>>> Am 17.05.2017 um 20:20 schrieb Jason Dillaman: >>>>> Does your ceph status show pg 2.cebed0aa (still) scrubbing? Sure -- I >>>>> can quickly scan the new log if you directly send it to me. >>>>> >>>>> On Wed, May 17, 2017 at 2:18 PM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> can send the osd log - if you want? >>>>>> >>>>>> Stefan >>>>>> >>>>>> Am 17.05.2017 um 20:13 schrieb Stefan Priebe - Profihost AG: >>>>>>> Hello Jason, >>>>>>> >>>>>>> the command >>>>>>> # rados -p cephstor6 rm rbd_data.21aafa6b8b4567.0000000000000aaa >>>>>>> >>>>>>> hangs as well. Doing absolutely nothing... waiting forever. >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> >>>>>>> Am 17.05.2017 um 17:05 schrieb Jason Dillaman: >>>>>>>> OSD 23 notes that object rbd_data.21aafa6b8b4567.0000000000000aaa is >>>>>>>> waiting for a scrub. What happens if you run "rados -p <rbd pool> rm >>>>>>>> rbd_data.21aafa6b8b4567.0000000000000aaa" (capturing the OSD 23 logs >>>>>>>> during this)? If that succeeds while your VM remains blocked on that >>>>>>>> remove op, it looks like there is some problem in the OSD where ops >>>>>>>> queued on a scrub are not properly awoken when the scrub completes. >>>>>>>> >>>>>>>> On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG >>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>> Hello Jason, >>>>>>>>> >>>>>>>>> after enabling the log and generating a gcore dump, the request was >>>>>>>>> successful ;-( >>>>>>>>> >>>>>>>>> So the log only contains the successfull request. So i was only able to >>>>>>>>> catch the successful request. I can send you the log on request. >>>>>>>>> >>>>>>>>> Luckily i had another VM on another Cluster behaving the same. >>>>>>>>> >>>>>>>>> This time osd.23: >>>>>>>>> # ceph --admin-daemon >>>>>>>>> /var/run/ceph/ceph-client.admin.22969.140085040783360.asok >>>>>>>>> objecter_requests >>>>>>>>> { >>>>>>>>> "ops": [ >>>>>>>>> { >>>>>>>>> "tid": 18777, >>>>>>>>> "pg": "2.cebed0aa", >>>>>>>>> "osd": 23, >>>>>>>>> "object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa", >>>>>>>>> "object_locator": "@2", >>>>>>>>> "target_object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa", >>>>>>>>> "target_object_locator": "@2", >>>>>>>>> "paused": 0, >>>>>>>>> "used_replica": 0, >>>>>>>>> "precalc_pgid": 0, >>>>>>>>> "last_sent": "1.83513e+06s", >>>>>>>>> "attempts": 1, >>>>>>>>> "snapid": "head", >>>>>>>>> "snap_context": "28a43=[]", >>>>>>>>> "mtime": "2017-05-17 16:51:06.0.455475s", >>>>>>>>> "osd_ops": [ >>>>>>>>> "delete" >>>>>>>>> ] >>>>>>>>> } >>>>>>>>> ], >>>>>>>>> "linger_ops": [ >>>>>>>>> { >>>>>>>>> "linger_id": 1, >>>>>>>>> "pg": "2.f0709c34", >>>>>>>>> "osd": 23, >>>>>>>>> "object_id": "rbd_header.21aafa6b8b4567", >>>>>>>>> "object_locator": "@2", >>>>>>>>> "target_object_id": "rbd_header.21aafa6b8b4567", >>>>>>>>> "target_object_locator": "@2", >>>>>>>>> "paused": 0, >>>>>>>>> "used_replica": 0, >>>>>>>>> "precalc_pgid": 0, >>>>>>>>> "snapid": "head", >>>>>>>>> "registered": "1" >>>>>>>>> } >>>>>>>>> ], >>>>>>>>> "pool_ops": [], >>>>>>>>> "pool_stat_ops": [], >>>>>>>>> "statfs_ops": [], >>>>>>>>> "command_ops": [] >>>>>>>>> } >>>>>>>>> >>>>>>>>> OSD Logfile of OSD 23 attached. >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> >>>>>>>>> Am 17.05.2017 um 16:26 schrieb Jason Dillaman: >>>>>>>>>> On Wed, May 17, 2017 at 10:21 AM, Stefan Priebe - Profihost AG >>>>>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>>>>> You mean the request no matter if it is successful or not? Which log >>>>>>>>>>> level should be set to 20? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm hoping you can re-create the hung remove op when OSD logging is >>>>>>>>>> increased -- "debug osd = 20" would be nice if you can turn it up that >>>>>>>>>> high while attempting to capture the blocked op. >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>> >>>>> >> >> >> -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com