Re: CephFS ghost usage/inodes

Oskar Malnowicz <oskar.malnowicz@xxxxxxxxxxxxxx> · Thu, 23 Jan 2020 16:08:15 +0100

Any other ideas ?

> Am 15.01.2020 um 15:50 schrieb Oskar Malnowicz <oskar.malnowicz@xxxxxxxxxxxxxx>:
> 
> the situation is:
> 
> health: HEALTH_WARN
>   1 pools have many more objects per pg than average
> 
> $ ceph health detail
> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>     pool cephfs_data objects per pg (315399) is more than 1227.23 times cluster average (257)
> 
> $ ceph df
> RAW STORAGE:
>     CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>     hdd       7.8 TiB     7.4 TiB     326 GiB      343 GiB          4.30
>     TOTAL     7.8 TiB     7.4 TiB     326 GiB      343 GiB          4.30
> 
> POOLS:
>     POOL               ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
>     cephfs_data         6     2.2 TiB       2.52M     2.2 TiB     26.44       3.0 TiB
>     cephfs_metadata     7     9.7 MiB         379     9.7 MiB         0       3.0 TiB
>   
> the stored value of the "cephfs_data" pool is 2.2TiB. This must be wrong. When i execute "du -sh" from the MDS root "/" i get an usage:
> 
> $ du -sh
> 31G     .
> 
> "df -h" shows:
> 
> $ df -h
> Filesystem       Size  Used Avail Use% Mounted on
> ip1,ip2,ip3:/    5.2T  2.2T  3.0T  43% /storage/cephfs
> 
> It says that "Used" ist 2.2T but "du" shows 31G
> 
> the pg_num from the "cephfs_data" pool is now 8. Autoscale suggest me to set this parameter to 512
> 
> $ ceph osd pool autoscale-status
> POOL                        SIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
> cephfs_metadata            9994k              2.0        7959G 0.0000               1.0      8            off
> cephfs_data                2221G              2.0        7959G 0.5582               1.0      8        512 off
> 
> after setting pg_num to 512 the situation is:
> 
> $ ceph health detail
> HEALTH_WARN 1 pools have many more objects per pg than average
> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>     pool cephfs_data objects per pg (4928) is more than 100.571 times cluster average (49)
> 
> $ ceph df
> RAW STORAGE:
>     CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>     hdd       7.8 TiB     7.4 TiB     329 GiB      346 GiB          4.34
>     TOTAL     7.8 TiB     7.4 TiB     329 GiB      346 GiB          4.34
> 
> POOLS:
>     POOL                          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
>     cephfs_data                    6      30 GiB       2.52M      61 GiB      0.99       3.0 TiB
>     cephfs_metadata                7     9.8 MiB         379      20 MiB         0       3.0 TiB
> 
> The "stored" value changed from 2.2TiB to 30GiB !!! This should be the correct usage/size.
> 
> When i execute "du -sh" from the MDS root "/" i get again an usage:
> 
> $ du -sh
> 31G
> 
> and "df -h" shows again
> 
> $ df -h
> Filesystem       Size  Used Avail Use% Mounted on
> ip1,ip2,ip3:/    5.2T  2.2T  3.0T  43% /storage/cephfs
> 
> It says that "Used" ist 2.2T but "du" shows 31G
> 
> Can anybody explain me whats the problem ?
> 
> 
> 
> 
> Am 14.01.20 um 11:15 schrieb Florian Pritz:
>> Hi,
>> 
>> When we tried putting some load on our test cephfs setup by restoring a
>> backup in artifactory, we eventually ran out of space (around 95% used
>> in `df` = 3.5TB) which caused artifactory to abort the restore and clean
>> up. However, while a simple `find` no longer shows the files, `df` still
>> claims that we have around 2.1TB of data on the cephfs. `df -i` also
>> shows 2.4M used inodes. When using `du -sh` on a top-level mountpoint, I
>> get 31G used, which is data that is still really here and which is
>> expected to be here.
>> 
>> Consequently, we also get the following warning:
>> 
>> 
>>> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>>>     pool cephfs_data objects per pg (38711) is more than 231.802 times cluster average (167)
>>> 
>> We are running ceph 14.2.5.
>> 
>> We have snapshots enabled on cephfs, but there are currently no active
>> snapshots listed by `ceph daemon mds.$hostname dump snaps --server` (see
>> below). I can't say for sure if we created snapshots during the backup
>> restore.
>> 
>> 
>>> {
>>>     "last_snap": 39,
>>>     "last_created": 38,
>>>     "last_destroyed": 39,
>>>     "pending_noop": [],
>>>     "snaps": [],
>>>     "need_to_purge": {},
>>>     "pending_update": [],
>>>     "pending_destroy": []
>>> }
>>> 
>> We only have a single CephFS.
>> 
>> We use the pool_namespace xattr for our various directory trees on the
>> cephfs.
>> 
>> `ceph df` shows:
>> 
>> 
>>> POOL         ID STORED   OBJECTS   USED    %USED     MAX AVAIL
>>> cephfs_data  6  2.1 TiB  2.48M     2.1 TiB 24.97       3.1 TiB
>>> 
>> `ceph daemon mds.$hostname perf dump | grep stray` shows:
>> 
>> 
>>> "num_strays": 0,
>>> "num_strays_delayed": 0,
>>> "num_strays_enqueuing": 0,
>>> "strays_created": 5097138,
>>> "strays_enqueued": 5097138,
>>> "strays_reintegrated": 0,
>>> "strays_migrated": 0,
>>> 
>> `rados -p cephfs_data df` shows:
>> 
>> 
>>> POOL_NAME      USED OBJECTS CLONES  COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED   RD_OPS      RD   WR_OPS     WR USED COMPR UNDER COMPR
>>> cephfs_data 2.1 TiB 2477540      0 4955080                  0       0        0 10699626 6.9 TiB 86911076 35 TiB        0 B         0 B
>>> 
>>> total_objects    29718
>>> total_used       329 GiB
>>> total_avail      7.5 TiB
>>> total_space      7.8 TiB
>>> 
>> When I combine the usage and the free space shown by `df` we would
>> exceed our cluster size. Our test cluster currently has 7.8TB total
>> space with a replication size of 2 for all pools. With 2.1TB
>> "used" on the cephfs according to `df` + 3.1TB being shows as "free" I
>> get 5.2TB total size. This would mean >10TB of data when accounted for
>> replication. Clearly this can't fit on a cluster with only 7.8TB of
>> capacity.
>> 
>> Do you have any ideas why we see so many objects and so much reported
>> usage? Is there any way to fix this without recreating the cephfs?
>> 
>> Florian
>> 
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- 
>> ceph-users@xxxxxxx
>> 
>> To unsubscribe send an email to 
>> ceph-users-leave@xxxxxxx
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx