Disks are filling up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

we created a cluster for using cephfs with a kubernetes cluster. Since a few weeks now the cluster keeps filling up at an alarming rate
(100 GB per day).
This is while the most relevant pg is deep scrubbing and was interupted a few times.

We use about 150G (du using the mounted filesystem) on the cephfs filesystem and try not to use snapshots (.snap directories "exist" but are empty). We do not understand why the pgs get bigger and bigger while cephfs stays about the same size (overwrites on files certainly happen).
I suspect some snapshots mechanism. Any ideas how to debug this to stop it?

Maybe we should try to speed up the deep scrubbing somehow?

ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)

  cluster:
    id:     ece0290c-cd32-11ec-a0e2-005056a9dd02
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            3 nearfull osd(s)
            13 pool(s) nearfull

  services:
    mon: 3 daemons, quorum acdh-gluster-hdd3,acdh-gluster-hdd1,acdh-gluster-hdd2 (age 3d)     mgr: acdh-gluster-hdd3.kzsplh(active, since 5d), standbys: acdh-gluster-hdd2.kiotbg, acdh-gluster-hdd1.ywgyfx
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 4d), 3 in (since 7w)
    rgw: 3 daemons active (3 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   13 pools, 292 pgs
    objects: 167.25M objects, 1.2 TiB
    usage:   3.3 TiB used, 1.2 TiB / 4.5 TiB avail
    pgs:     290 active+clean
             1   active+clean+scrubbing+deep
             1   active+clean+scrubbing

  io:
    client:   58 MiB/s rd, 3.6 MiB/s wr, 51 op/s rd, 148 op/s wr

rancher-ceph-fs - 227 clients
===============
RANK  STATE                  MDS                    ACTIVITY DNS    INOS   DIRS   CAPS  0    active  ceph-mds.acdh-gluster-hdd1.pqydya  Reqs:   68 /s 793k   792k   102k   210k
          POOL              TYPE     USED  AVAIL
 rancherFsPoolMetadata    metadata   160G   329G
rancherFsPoolDefaultData    data    2268k   329G
 rancherFsPoolMainData      data    2584G   658G
           STANDBY MDS
ceph-mds.acdh-gluster-hdd2.zfleqe
ceph-mds.acdh-gluster-hdd3.etaobl
MDS version: ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)

(rancherFsPoolMainData is a 2+1 erasure encoded pool)

--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    4.5 TiB  1.2 TiB  3.3 TiB   3.3 TiB      73.46
TOTAL  4.5 TiB  1.2 TiB  3.3 TiB   3.3 TiB      73.46

--- POOLS ---
POOL                        ID  PGS   STORED  OBJECTS     USED %USED  MAX AVAIL device_health_metrics        1    1      0 B        0      0 B      0    331 GiB rancher-rbd-erasure          2   32  8.4 GiB    2.16k   13 GiB 1.25    661 GiB rancher-rbd-meta             3   32     55 B       11   36 KiB      0    331 GiB rancherFsPoolMetadata        4   32   53 GiB    5.18M  160 GiB 13.88    331 GiB rancherFsPoolDefaultData     5    1   29 KiB   80.00M  2.2 MiB      0    331 GiB rancherFsPoolMainData        6    1  1.7 TiB   82.08M  2.5 TiB 72.23    661 GiB .rgw.root                    7   32  1.3 KiB        4   48 KiB      0    331 GiB default.rgw.log              8   32  3.6 KiB      209  408 KiB      0    331 GiB default.rgw.control          9   32      0 B        8      0 B      0    331 GiB default.rgw.meta            10   32  3.8 KiB       11  124 KiB      0    331 GiB default.rgw.buckets.index   11   32  2.4 MiB       33  7.2 MiB      0    331 GiB default.rgw.buckets.non-ec  12   32      0 B        0      0 B      0    331 GiB default.rgw.buckets.data    14    1   55 GiB   16.57k   83 GiB 7.70    661 GiB

HEALTH_WARN 1 MDSs report slow metadata IOs; 3 nearfull osd(s); 13 pool(s) nearfull
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ceph-mds.acdh-gluster-hdd1.pqydya(mds.0): 100+ slow metadata IOs are blocked > 30 secs, oldest blocked for 306 secs
[WRN] OSD_NEARFULL: 3 nearfull osd(s)
    osd.0 is near full
    osd.2 is near full
    osd.3 is near full
[WRN] POOL_NEARFULL: 13 pool(s) nearfull
    pool 'device_health_metrics' is nearfull
    pool 'rancher-rbd-erasure' is nearfull
    pool 'rancher-rbd-meta' is nearfull
    pool 'rancherFsPoolMetadata' is nearfull
    pool 'rancherFsPoolDefaultData' is nearfull
    pool 'rancherFsPoolMainData' is nearfull
    pool '.rgw.root' is nearfull
    pool 'default.rgw.log' is nearfull
    pool 'default.rgw.control' is nearfull
    pool 'default.rgw.meta' is nearfull
    pool 'default.rgw.buckets.index' is nearfull
    pool 'default.rgw.buckets.non-ec' is nearfull
    pool 'default.rgw.buckets.data' is nearfull

(near full is set to 0.66)

--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons
Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.siam@xxxxxxxxxx | www.oeaw.ac.at/acdh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux