On Wed, Jul 3, 2024 at 5:45 PM Reid Guyett <reid.guyett@xxxxxxxxx> wrote: > > Hi, > > I have a small script in a Docker container we use for a type of CRUD test > to monitor availability. The script uses Python librbd/librados and is > launched by Telegraf input.exec. It does the following: > > 1. Creates an rbd image > 2. Writes a small amount of data to the rbd > 3. Reads the data from the rbd > 4. Deletes the rbd > 5. Closes connections > > It works great for 99% of the time but there is a small chance that > something happens and the script takes too long (1 min) to complete and it > is killed. I don't have logging to know which step it happens at yet but > will be adding some. Regardless when the script is killed, sometimes the > watcher on the rbd isn't going away. I use the same RBD name for each test > and try to clean up the rbd if it exists prior to starting the next test > but when the watcher is stuck, it can't. > > The only way to cleanup the watcher is to restart the primary osd for the > rbd_header. Blocklist and restarting the container free the watcher. > > When I look at the status of the image I can see the watcher. > # rbd -p pool status crud-image > Watchers: > watcher=<ipaddr>:0/3587274006 client.1053762394 cookie=140375838755648 > > Lookup up primary OSD > # rbd -p pool info crud-image | grep id > id: cf235ae95099cb > # ceph osd map pool rbd_header.cf235ae95099cb > osdmap e332984 pool 'pool' (1) object 'rbd_header.cf235ae95099cb' -> pg > 1.a76f353e (1.53e) -> up ([7,66,176], p7) acting ([7,66,176], p7) > > Checking watchers on primary OSD does NOT list rbd_header.cf235ae95099cb > # ceph tell osd.7 dump_watchers > [ > { > "namespace": "", > "object": "rbd_header.70fa4f9b5c2cf8", > "entity_name": { > "type": "client", > "num": 998139266 > }, > "cookie": 140354859197312, > "timeout": 30, > "entity_addr_t": { > "type": "v1", > "addr": "<ipaddr>:0", > "nonce": 2665188958 > } > } > ] > > Is this a bug somewhere? I expect that if my script is killed it's watcher > should die out within a minute. New runs of the script would result in new > watcher/client/cookie ids. Hi Reid, You might be hitting https://tracker.ceph.com/issues/58120. It looks like the ticket wasn't moved to the appropriate state when the fix got merged, so unfortunately the fix isn't available in any of the stable releases -- only in 19.1.0 (release candidate for squid). I have just tweaked the ticket and will stage backport PRs shortly. Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx