Re: ceph df: Raw used vs. used vs. actual bytes in cephfs

Igor Fedotov <ifedotov@xxxxxxx> · Tue, 20 Feb 2018 15:02:23 +0300

Another space "leak" might be due BlueStore misbehavior that takes DB 
partition(s) space into account when calculating total store size. And 
all this space is immediately marked as used even for an empty store. So 
if you have 3 OSD with 10 Gb DB device each you unconditionally get 30 
Gb used space in the report.

Plus additional 1Gb (with default settings) per each OSD as BlueStore 
unconditionally locks that space at block device for BlueFS usage.

Also it might allocate (and hence report as used) even more space at 
block device for BlueFS if DB partition isn't enough. You should inspect 
OSD performance counters under "bluefs" section to check that amount.

Also please note that for 64K allocation
On 2/20/2018 10:33 AM, Flemming Frandsen wrote:
I didn't know about ceph df detail, that's quite useful, thanks.

I was thinking that the problem had to do with some sort of internal 
fragmentation, because the filesystem in question does have millions 
(2.9 M or threabouts) of files, however, even if 4k is lost for each 
file, that only amounts to about 23 GB of raw space lost and I have 
3276 GB of raw space unaccounted for.

I've researched the min alloc option a bit and even though no 
documentation seems to exist, I've found that the default is 64k for 
hdd, but even if the lost space per file is 64k and that's mirrored, I 
can only account for 371 GB, so that doesn't really help a great deal.

I have set up an experimental cluster with "bluestore min alloc size = 
4096" and so far I've been unable to make it lose space like the first 
cluster.

I'm very worried that ceph is unusable because of this issue.

On 19/02/18 19:38, Pavel Shub wrote:
Could you be running into block size (minimum allocation unit)
overhead? Default bluestore block size is 4k for hdd and 64k for ssd.
This is exacerbated if you have tons of small files. I tend to see
this when "ceph df detail" sum of raw used in pools is less than the
global raw bytes used.

On Mon, Feb 19, 2018 at 2:09 AM, Flemming Frandsen
<flemming.frandsen@xxxxxxxxxxxxxxxx> wrote:
Each OSD lives on a separate HDD in bluestore with the journals on 2GB
partitions on a shared SSD.

On 16/02/18 21:08, Gregory Farnum wrote:

What does the cluster deployment look like? Usually this happens 
when you’re
sharing disks with the OS, or have co-located file journals or 
something.
On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen
<flemming.frandsen@xxxxxxxxxxxxxxxx> wrote:
I'm trying out cephfs and I'm in the process of copying over some
real-world data to see what happens.

I have created a number of cephfs file systems, the only one I've
started working on is the one called jenkins specifically the one 
named
jenkins which lives in fs_jenkins_data and fs_jenkins_metadata.

According to ceph df I have about 1387 GB of data in all of the pools,
while the raw used space is 5918 GB, which gives a ratio of about 
4.3, I
would have expected a ratio around 2 as the pool size has been set 
to 2.

Can anyone explain where half my space has been squandered?

  > ceph df
GLOBAL:
      SIZE      AVAIL     RAW USED     %RAW USED
      8382G     2463G        5918G         70.61
POOLS:
      NAME                         ID     USED %USED     MAX
AVAIL     OBJECTS
      .rgw.root                    1        1113         0 258G
4
      default.rgw.control          2           0         0 258G
8
      default.rgw.meta             3           0         0 258G
0
      default.rgw.log              4           0         0 258G
207
      fs_docker-nexus_data         5      66120M     11.09 258G
22655
      fs_docker-nexus_metadata     6      39463k         0 258G
2376
      fs_meta_data                 7         330         0 258G
4
      fs_meta_metadata             8        567k         0 258G
22
      fs_jenkins_data              9       1321G     71.84 258G
28576278
      fs_jenkins_metadata          10     52178k         0 258G
2285493
      fs_nexus_data                11          0         0 258G
0
      fs_nexus_metadata            12       4181         0 258G
21

--
   Regards Flemming Frandsen - Stibo Systems - DK - STEP Release 
Manager
   Please use release@xxxxxxxxx for all Release Management requests

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
  Please use release@xxxxxxxxx for all Release Management requests

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com