Re: Suggestions

Eugen Block <eblock@xxxxxx> · Sat, 01 Feb 2025 14:06:03 +0000

Hi,

there are already commands to control watchers, e.g. 'rbd lock list  
pool/image' and 'rbd lock rm pool/image', and a few others. The  
manpage [0] contains more commands.

The prometheus alert might take two days to clear because it predicts  
based on a 2 day interval.

The 'ceph osd pool repair' command (if it is what you mean here, you  
haven't really clarified yet) basically issues a 'ceph pg repair' for  
each PG of an entire pool. If you only have a single or a few  
inconsistent PGs, you might rather consider running the 'ceph pg  
repair' command only for the affected PG(s), not the entire pool. The  
question is, why would you want to stop a repair? You could also  
enable automatic repair during scrubbing (ceph config set osd  
osd_scrub_auto_repair true) if you experience that often enough. But  
the downside is, you'll miss the notification (health warning). You'd  
have to inspect OSD logs regularly to be aware of automatic repairs.

[0] https://docs.ceph.com/en/latest/man/8/rbd/

Zitat von Devender Singh <devender@xxxxxxxxxx>:

Hello all

Few more suggestions.. if can be added to further releases.

1. We faced some issue, can we add more command to control clients  
using watcher,

rbd status pool/image

Watchers:
	watcher=10.160.0.245:0/2076588905 client.12541259 cookie=140446370329088

Some commands to control watcher and kill client.id  
<http://client.id/>. something like

rbd lock remove <pool-name>/<image-name> <client_id>
Or
rbd watchers <pool-name>/<image-name>

Or something
rbd check <pool-name>/<image-name>

Or
rbd list watchers pool-name or pool/image

2. Also, as we have multiple ceph clusters, so on dashboard every  
time by going to hosts we are able to see hosts names to identify  
the nodes and cluster type lets say, dev or prod.
Can we have a variable on dashboard to see “Name: Location-Dev”?, I  
think we have enough space to list name in this are.

3. Seems dashboard/mgr is not cleaning itself. Most of time we need  
to fail to manager to clear such errors, But it seems a similar  
issue as in step1 above.
I mounted this volume and unmounted and clean everything, even mount  
point. But for last three days this alert is active, I have tried  
failing back to different mgrs.

CephNodeDiskspaceWarning
Mountpoint /mnt/dst-volume on prod-host1 will be full in less than 5  
days based on the 48 hour trailing fill rate.

4. We need more command to control pool repair.
If we have started a pool repair command, how we can stop it?

Regards
Dev
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx