Re: something wrong with my monitor database ?

Eric Le Lay <eric.lelay@xxxxxxxxxxxxx> · Mon, 13 Jun 2022 18:21:20 +0200

Le 13/06/2022 à 17:54, Eric Le Lay a écrit :
Le 10/06/2022 à 11:58, Stefan Kooman a écrit :
CAUTION: This email originated from outside the organization. Do not 
click links or open attachments unless you recognize the sender and 
know the content is safe.

On 6/10/22 11:41, Eric Le Lay wrote:
Hello list,

my ceph cluster was upgraded from nautilus to octopus last October,
causing snaptrims
to overload OSDs so I had to disable them 
(bluefs_buffered_io=false|true
didn't help).

Now I've copied data elsewhere and removed all clients and try to fix
the cluster.
Scraping it and starting over is possible, but it would be wonderful if
we could
figure out what's wrong with it...

FYI: osd snap trim sleep <- adding some sleep might help alleviate the
impact on the cluster.

If HEALTH is OK I would not expect anything wrong with your cluster.

Does " ceph osd dump |grep require_osd_release" give you
require_osd_release octopus?

Gr. Stefan
Hi Stefan,

thank you for your answer.
Even osd_snap_trim_sleep=10 was not sustainable with normal cluster 
load.|
Following your email I've tested bluefs_buffered_io=true again and 
indeed it dramatically reduces disk load, but not cpu nor slow ceph io.

Yes, require_osd_release=octopus.

What worries me is the pool is now void of rbd images, but still has 
14TiB of object data.
Here is my pool contents. rbd_directory, rbd_trash are empty.

   rados -p storage ls | sed 's/\(.*\..*\)\..*/\1/'|sort|uniq -c
      1 rbd_children
      6 rbd_data.13fc0d1d63c52b
   2634 rbd_data.15ab844f62d5
    258 rbd_data.15f1f2e2398dc7
    133 rbd_data.17d93e1c5a4855
    258 rbd_data.1af03e352ec460
   2987 rbd_data.236cfc2474b020
 206872 rbd_data.31c55ee49f0abb
 604593 rbd_data.5b423b48a4643f
     90 rbd_data.7b06b7abcc9441
  81576 rbd_data.913b398f28d1
     18 rbd_data.9662ade11235a
  16051 rbd_data.e01609a7a07e20
    278 rbd_data.e6b6f855b5172c
     90 rbd_data.e85da37e044922
      1 rbd_directory
      1 rbd_info
      1 rbd_trash

Eric

Those objects are deleted but have snapshots, even if the pool itself 
doesn't have snapshots.
What could cause that?

root@hpc1a:~# rados -p storage stat rbd_data.5b423b48a4643f.000000000006a4e5
 error stat-ing storage/rbd_data.5b423b48a4643f.000000000006a4e5: (2) 
No such file or directory
root@hpc1a:~# rados -p storage lssnap
0 snaps
root@hpc1a:~# rados -p storage listsnaps 
rbd_data.5b423b48a4643f.000000000006a4e5
rbd_data.5b423b48a4643f.000000000006a4e5:
cloneid    snaps    size    overlap
1160    1160    4194304 
[1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384]
1364    1364    4194304    []

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx