Re: Ceph cache pool full

David Turner <drakonstein@xxxxxxxxx> · Fri, 06 Oct 2017 03:30:41 +0000

You're missing most all of the important bits. What the osds in your cluster look like, your tree, and your cache pool settings.
ceph df

ceph osd df

ceph osd tree

ceph osd pool get cephfs_cache all
You have your writeback cache on 3 nvme drives. It looks like you have 1.6TB available between them for the cache. I don't know the behavior of a writeback cache tier on cephfs for large files, but I would guess that it can only hold full files and not flush partial files. That would mean your cache needs to have enough space for any file being written to the cluster. In this case a 1.3TB file with 3x replication would require 3.9TB (more than double what you have available) of available space in your writeback cache.
There are very few use cases that benefit from a cache tier. The docs for Luminous warn as much. What is your goal by implementing this cache? If the answer is to utilize extra space on the nvmes, then just remove it and say thank you. The better use of nvmes in that case are as a part of the bluestore stack and give your osds larger DB partitions. Keeping your metadata pool on nvmes is still a good idea.

On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong <shaw@xxxxxxxx> wrote:
Dear all,
We just set up a Ceph cluster, running the latest stable release Ceph v12.2.0 (Luminous):
# ceph --version
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

The goal is to serve Ceph filesystem, for which we created 3 pools:
# ceph osd lspools
1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
where
* cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
* cephfs_metadata is the metadata pool
* cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The cache-mode is writeback.

Everything had worked fine, until today when we tried to copy a 1.3TB file to the CephFS.  We got the "No space left on device" error!

'ceph -s' says some OSDs are full:
# ceph -s
  cluster:
    id:     e18516bf-39cb-4670-9f13-88ccb7d19769
    health: HEALTH_ERR
            full flag(s) set
            1 full osd(s)
            1 pools have many more objects per pg than average

  services:
    mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
    mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
    mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
    osd: 39 osds: 39 up, 39 in
         flags full

  data:
    pools:   3 pools, 2176 pgs
    objects: 347k objects, 1381 GB
    usage:   2847 GB used, 262 TB / 265 TB avail
    pgs:     2176 active+clean

  io:
    client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr

And indeed the cache pool is full:
# rados df
POOL_NAME       USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS   RD
    WR_OPS  WR
cephfs_cache    1381G  355385      0 710770                  0       0        0 10004954 15
22G 1398063  1611G
cephfs_data         0       0      0      0                  0       0        0        0
  0       0      0
cephfs_metadata 8515k      24      0     72                  0       0        0        3  3
072    3953 10541k

total_objects    355409
total_used       2847G
total_avail      262T
total_space      265T

However, the data pool is completely empty! So it seems that data has only been written to the cache pool, but not written back to the data pool.

I am really at a loss whether this is due to a setup error on my part, or a Luminous bug. Could anyone shed some light on this? Please let me know if you need any further info.

Best,
Shaw
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com