Re: Dead node (watcher) won't timeout on RBD

Eugen Block <eblock@xxxxxx> · Wed, 26 Apr 2023 07:57:01 +0000

Hi, can you share the exact command how you blocked the watcher? To  
get the lock list run:

rbd lock list <pool>/<image>
There is 1 exclusive lock on this image.
Locker          ID                    Address
client.1211875  auto 139643345791728  192.168.3.12:0/2259335316

To blacklist the client run:

ceph osd blacklist add client.1211875

Or try it with rbd as well:

rbd lock rm <pool>/<image> client.1211875

Hope that helps!

Zitat von max@xxxxxxxxxx:

Hey all,

I recently had a k8s node failure in my homelab, and even though I  
powered it off (and it's done for, so it won't get back up), it  
still shows up as watcher in rbd status.

```
root@node0:~# rbd status  
kubernetes/csi-vol-3e7af8ae-ceb6-4c94-8435-2f8dc29b313b
Watchers:
	watcher=10.0.0.103:0/1520114202 client.1697844 cookie=140289402510784
	watcher=10.0.0.103:0/39967552 client.1805496 cookie=140549449430704
root@node0:~# ceph osd blocklist ls
10.0.0.103:0/0 2023-04-15T13:15:39.061379+0200
listed 1 entries
```

Even though the node is down & I have blocked it multiple times for  
hours, it won't disappear. Meaning, ceph-csi-rbd claims the image is  
mounted already (manually binding works fine, and can cleanly unbind  
as well, but can't unbind from a node that doesn't exist anymore).

Is there any possibility to force kick an rbd client / watcher from  
ceph (e.g. switching the mgr / mon) or to see why this is not timing  
out?

I found some historical mails & issues (related to rook, which I  
don't use) regarding a param `osd_client_watch_timeout` but can't  
find how that relates to the RBD images.

Cheers,
Max.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx