Hello Jason, minutes ago i had another case where i restarted the osd which was shown in objecter_requests output. It seems also other scrubs and deep scrubs were hanging. Output before: 4095 active+clean 1 active+clean+scrubbing Output after restart: 4084 active+clean 7 active+clean+scrubbing+deep 5 active+clean+scrubbing both values are changing every few seconds again doing a lot of scrubs and deep scubs. Greets, Stefan Am 17.05.2017 um 20:36 schrieb Stefan Priebe - Profihost AG: > Hi, > > that command does not exist. > > But at least ceph -s permanently reports 1 pg in scrubbing with no change. > > Log attached as well. > > Greets, > Stefan > Am 17.05.2017 um 20:20 schrieb Jason Dillaman: >> Does your ceph status show pg 2.cebed0aa (still) scrubbing? Sure -- I >> can quickly scan the new log if you directly send it to me. >> >> On Wed, May 17, 2017 at 2:18 PM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> can send the osd log - if you want? >>> >>> Stefan >>> >>> Am 17.05.2017 um 20:13 schrieb Stefan Priebe - Profihost AG: >>>> Hello Jason, >>>> >>>> the command >>>> # rados -p cephstor6 rm rbd_data.21aafa6b8b4567.0000000000000aaa >>>> >>>> hangs as well. Doing absolutely nothing... waiting forever. >>>> >>>> Greets, >>>> Stefan >>>> >>>> Am 17.05.2017 um 17:05 schrieb Jason Dillaman: >>>>> OSD 23 notes that object rbd_data.21aafa6b8b4567.0000000000000aaa is >>>>> waiting for a scrub. What happens if you run "rados -p <rbd pool> rm >>>>> rbd_data.21aafa6b8b4567.0000000000000aaa" (capturing the OSD 23 logs >>>>> during this)? If that succeeds while your VM remains blocked on that >>>>> remove op, it looks like there is some problem in the OSD where ops >>>>> queued on a scrub are not properly awoken when the scrub completes. >>>>> >>>>> On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> Hello Jason, >>>>>> >>>>>> after enabling the log and generating a gcore dump, the request was >>>>>> successful ;-( >>>>>> >>>>>> So the log only contains the successfull request. So i was only able to >>>>>> catch the successful request. I can send you the log on request. >>>>>> >>>>>> Luckily i had another VM on another Cluster behaving the same. >>>>>> >>>>>> This time osd.23: >>>>>> # ceph --admin-daemon >>>>>> /var/run/ceph/ceph-client.admin.22969.140085040783360.asok >>>>>> objecter_requests >>>>>> { >>>>>> "ops": [ >>>>>> { >>>>>> "tid": 18777, >>>>>> "pg": "2.cebed0aa", >>>>>> "osd": 23, >>>>>> "object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa", >>>>>> "object_locator": "@2", >>>>>> "target_object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa", >>>>>> "target_object_locator": "@2", >>>>>> "paused": 0, >>>>>> "used_replica": 0, >>>>>> "precalc_pgid": 0, >>>>>> "last_sent": "1.83513e+06s", >>>>>> "attempts": 1, >>>>>> "snapid": "head", >>>>>> "snap_context": "28a43=[]", >>>>>> "mtime": "2017-05-17 16:51:06.0.455475s", >>>>>> "osd_ops": [ >>>>>> "delete" >>>>>> ] >>>>>> } >>>>>> ], >>>>>> "linger_ops": [ >>>>>> { >>>>>> "linger_id": 1, >>>>>> "pg": "2.f0709c34", >>>>>> "osd": 23, >>>>>> "object_id": "rbd_header.21aafa6b8b4567", >>>>>> "object_locator": "@2", >>>>>> "target_object_id": "rbd_header.21aafa6b8b4567", >>>>>> "target_object_locator": "@2", >>>>>> "paused": 0, >>>>>> "used_replica": 0, >>>>>> "precalc_pgid": 0, >>>>>> "snapid": "head", >>>>>> "registered": "1" >>>>>> } >>>>>> ], >>>>>> "pool_ops": [], >>>>>> "pool_stat_ops": [], >>>>>> "statfs_ops": [], >>>>>> "command_ops": [] >>>>>> } >>>>>> >>>>>> OSD Logfile of OSD 23 attached. >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> Am 17.05.2017 um 16:26 schrieb Jason Dillaman: >>>>>>> On Wed, May 17, 2017 at 10:21 AM, Stefan Priebe - Profihost AG >>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>> You mean the request no matter if it is successful or not? Which log >>>>>>>> level should be set to 20? >>>>>>> >>>>>>> >>>>>>> I'm hoping you can re-create the hung remove op when OSD logging is >>>>>>> increased -- "debug osd = 20" would be nice if you can turn it up that >>>>>>> high while attempting to capture the blocked op. >>>>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com