Re: ceph df: Raw used vs. used vs. actual bytes in cephfs

Flemming Frandsen <flemming.frandsen@xxxxxxxxxxxxxxxx> · Tue, 20 Feb 2018 08:33:02 +0100

I didn't know about ceph df detail, that's quite useful, thanks.

I was thinking that the problem had to do with some sort of internal 
fragmentation, because the filesystem in question does have millions 
(2.9 M or threabouts) of files, however, even if 4k is lost for each 
file, that only amounts to about 23 GB of raw space lost and I have 3276 
GB of raw space unaccounted for.

I've researched the min alloc option a bit and even though no 
documentation seems to exist, I've found that the default is 64k for 
hdd, but even if the lost space per file is 64k and that's mirrored, I 
can only account for 371 GB, so that doesn't really help a great deal.

I have set up an experimental cluster with "bluestore min alloc size = 
4096" and so far I've been unable to make it lose space like the first 
cluster.

I'm very worried that ceph is unusable because of this issue.

On 19/02/18 19:38, Pavel Shub wrote:
Could you be running into block size (minimum allocation unit)
overhead? Default bluestore block size is 4k for hdd and 64k for ssd.
This is exacerbated if you have tons of small files. I tend to see
this when "ceph df detail" sum of raw used in pools is less than the
global raw bytes used.

On Mon, Feb 19, 2018 at 2:09 AM, Flemming Frandsen
<flemming.frandsen@xxxxxxxxxxxxxxxx> wrote:
Each OSD lives on a separate HDD in bluestore with the journals on 2GB
partitions on a shared SSD.

On 16/02/18 21:08, Gregory Farnum wrote:

What does the cluster deployment look like? Usually this happens when you’re
sharing disks with the OS, or have co-located file journals or something.
On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen
<flemming.frandsen@xxxxxxxxxxxxxxxx> wrote:
I'm trying out cephfs and I'm in the process of copying over some
real-world data to see what happens.

I have created a number of cephfs file systems, the only one I've
started working on is the one called jenkins specifically the one named
jenkins which lives in fs_jenkins_data and fs_jenkins_metadata.

According to ceph df I have about 1387 GB of data in all of the pools,
while the raw used space is 5918 GB, which gives a ratio of about 4.3, I
would have expected a ratio around 2 as the pool size has been set to 2.

Can anyone explain where half my space has been squandered?

  > ceph df
GLOBAL:
      SIZE      AVAIL     RAW USED     %RAW USED
      8382G     2463G        5918G         70.61
POOLS:
      NAME                         ID     USED       %USED     MAX
AVAIL     OBJECTS
      .rgw.root                    1        1113         0 258G
4
      default.rgw.control          2           0         0 258G
8
      default.rgw.meta             3           0         0 258G
0
      default.rgw.log              4           0         0 258G
207
      fs_docker-nexus_data         5      66120M     11.09 258G
22655
      fs_docker-nexus_metadata     6      39463k         0 258G
2376
      fs_meta_data                 7         330         0 258G
4
      fs_meta_metadata             8        567k         0 258G
22
      fs_jenkins_data              9       1321G     71.84 258G
28576278
      fs_jenkins_metadata          10     52178k         0 258G
2285493
      fs_nexus_data                11          0         0 258G
0
      fs_nexus_metadata            12       4181         0 258G
21

--
   Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
   Please use release@xxxxxxxxx for all Release Management requests

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
  Please use release@xxxxxxxxx for all Release Management requests

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use release@xxxxxxxxx for all Release Management requests

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com