Hi, It sounds similar. How would I best be able to confirm it? Logs? Which log/message if so? Thanks On Thu, Jul 25, 2024 at 6:11 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Wed, Jul 3, 2024 at 5:45 PM Reid Guyett <reid.guyett@xxxxxxxxx> wrote: > > > > Hi, > > > > I have a small script in a Docker container we use for a type of CRUD > test > > to monitor availability. The script uses Python librbd/librados and is > > launched by Telegraf input.exec. It does the following: > > > > 1. Creates an rbd image > > 2. Writes a small amount of data to the rbd > > 3. Reads the data from the rbd > > 4. Deletes the rbd > > 5. Closes connections > > > > It works great for 99% of the time but there is a small chance that > > something happens and the script takes too long (1 min) to complete and > it > > is killed. I don't have logging to know which step it happens at yet but > > will be adding some. Regardless when the script is killed, sometimes the > > watcher on the rbd isn't going away. I use the same RBD name for each > test > > and try to clean up the rbd if it exists prior to starting the next test > > but when the watcher is stuck, it can't. > > > > The only way to cleanup the watcher is to restart the primary osd for the > > rbd_header. Blocklist and restarting the container free the watcher. > > > > When I look at the status of the image I can see the watcher. > > # rbd -p pool status crud-image > > Watchers: > > watcher=<ipaddr>:0/3587274006 client.1053762394 cookie=140375838755648 > > > > Lookup up primary OSD > > # rbd -p pool info crud-image | grep id > > id: cf235ae95099cb > > # ceph osd map pool rbd_header.cf235ae95099cb > > osdmap e332984 pool 'pool' (1) object 'rbd_header.cf235ae95099cb' -> pg > > 1.a76f353e (1.53e) -> up ([7,66,176], p7) acting ([7,66,176], p7) > > > > Checking watchers on primary OSD does NOT list rbd_header.cf235ae95099cb > > # ceph tell osd.7 dump_watchers > > [ > > { > > "namespace": "", > > "object": "rbd_header.70fa4f9b5c2cf8", > > "entity_name": { > > "type": "client", > > "num": 998139266 > > }, > > "cookie": 140354859197312, > > "timeout": 30, > > "entity_addr_t": { > > "type": "v1", > > "addr": "<ipaddr>:0", > > "nonce": 2665188958 > > } > > } > > ] > > > > Is this a bug somewhere? I expect that if my script is killed it's > watcher > > should die out within a minute. New runs of the script would result in > new > > watcher/client/cookie ids. > > Hi Reid, > > You might be hitting https://tracker.ceph.com/issues/58120. It looks > like the ticket wasn't moved to the appropriate state when the fix got > merged, so unfortunately the fix isn't available in any of the stable > releases -- only in 19.1.0 (release candidate for squid). I have just > tweaked the ticket and will stage backport PRs shortly. > > Thanks, > > Ilya > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx