So! Here is really mysterious resolution. The issue vanished at the moment I requested the osd about its slow_ops history. I didn’t had time to do anything except to look for the osd ops history that was actually empty :-) I’ll keep all your suggestions if it ever came back :-) Thanks a lot! Le mer. 23 févr. 2022 à 12:51, Gaël THEROND <gael.therond@xxxxxxxxxxxx> a écrit : > Thanks a lot Eugene, I dumbly forgot about the rbd block prefix! > > I’ll try that this afternoon and told you how it went. > > Le mer. 23 févr. 2022 à 11:41, Eugen Block <eblock@xxxxxx> a écrit : > >> Hi, >> >> > How can I identify which operation this OSD is trying to achieve as >> > osd_op() is a bit large ^^ ? >> >> I would start by querying the OSD for historic_slow_ops: >> >> ceph daemon osd.<OSD> dump_historic_slow_ops to see which operation it is. >> >> > How can I identify the related images to this data chunk? >> >> You could go through all rbd images and check for the line containing >> block_name_prefix, this could take some time depending on how many >> images you have: >> >> block_name_prefix: rbd_data.ca69416b8b4567 >> >> I sometimes do that with this for loop: >> >> for i in `rbd -p <POOL> ls`; do if [ $(rbd info <POOL>/$i | grep -c >> <PREFIX>) -gt 0 ]; then echo "image: $i"; break; fi; done >> >> So in your case it would look something like this: >> >> for i in `rbd -p <POOL> ls`; do if [ $(rbd info <POOL>/$i | grep -c >> 89a4a940aba90b -gt 0 ]; then echo "image: $i"; break; fi; done >> >> To see which clients are connected you can check the mon daemon: >> >> ceph daemon mon.<MON> sessions >> >> The mon daemon also has a history of slow ops: >> >> ceph daemon mon.<MON> dump_historic_slow_ops >> >> Regards, >> Eugen >> >> >> Zitat von Gaël THEROND <gael.therond@xxxxxxxxxxxx>: >> >> > Hi everyone, I'm having a really nasty issue since around two days where >> > our cluster report a bunch of SLOW_OPS on one of our OSD as: >> > >> > https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/ >> > >> > Here is the cluster specification: >> > * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift). >> > * Based on CEPH Nautilus 14.2.8 installed using ceph-ansible. >> > * Use an EC based storage profile. >> > * We have a separate and dedicated frontend and backend 10Gbps >> network. >> > * We don't have any network issues observed or reported by our >> monitoring >> > system. >> > >> > Here is our current cluster status: >> > https://paste.openstack.org/show/biVnkm9Yyog3lmSUn0UK/ >> > Here is a detailed view of our cluster status: >> > https://paste.openstack.org/show/bgKCSVuow0JUZITo2Ndj/ >> > >> > My main issue here is that this health alert is starting to fill the >> > Monitor's disk and so trigger a MON_DISK_BIG alert. >> > >> > I'm worried as I'm having a hard time to identify which osd operation is >> > actually slow and especially, which image does it concern and which >> client >> > is using it. >> > >> > So far I've try: >> > * To match this client ID with any watcher of our stored >> > volumes/vms/snaphots by extracting the whole list and then using the >> > following command: *rbd status <pool>/<image>* >> > Unfortunately none of the watchers is matching my reported client >> from >> > the OSD on any pool. >> > >> > * * *To map this reported chunk of data to any of our store image >> > using: *ceph >> > osd map <pool>/rbd_data.5.89a4a940aba90b.00000000000000a0* >> > Unfortunately any pool name existing within our cluster give me >> back >> > an answer with no image information and a different watcher client ID. >> > >> > So my questions are: >> > >> > How can I identify which operation this OSD is trying to achieve as >> > osd_op() is a bit large ^^ ? >> > Does the *snapc *information part within the log relate to snapshot or >> is >> > that something totally different? >> > How can I identify the related images to this data chunk? >> > Is there official documentation about SLOW_OPS operations code >> explaining >> > how to read the logs like something that explains which block is PG >> > number, which is the ID of something etc? >> > >> > Thanks a lot everyone and feel free to ask for additional information! >> > G. >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx