Re: Ceph cache pool full

Christian Balzer <chibi@xxxxxxx> · Fri, 6 Oct 2017 14:05:38 +0900

Hello,

On Fri, 06 Oct 2017 03:30:41 +0000 David Turner wrote:

> You're missing most all of the important bits. What the osds in your
> cluster look like, your tree, and your cache pool settings.
> 
> ceph df
> ceph osd df
> ceph osd tree
> ceph osd pool get cephfs_cache all
>
Especially the last one.

My money is on not having set target_max_objects and target_max_bytes to
sensible values along with the ratios.
In short, not having read the (albeit spotty) documentation.

> You have your writeback cache on 3 nvme drives. It looks like you have
> 1.6TB available between them for the cache. I don't know the behavior of a
> writeback cache tier on cephfs for large files, but I would guess that it
> can only hold full files and not flush partial files. 

I VERY much doubt that, if so it would be a massive flaw.
One assumes that cache operations work on the RADOS object level, no
matter what.

> That would mean your
> cache needs to have enough space for any file being written to the cluster.
> In this case a 1.3TB file with 3x replication would require 3.9TB (more
> than double what you have available) of available space in your writeback
> cache.
> 
> There are very few use cases that benefit from a cache tier. The docs for
> Luminous warn as much. 
You keep repeating that like a broken record.

And while certainly not false I for one wouldn't be able to use (justify
using) Ceph w/o cache tiers in our main use case.

In this case I assume they were following and old cheat sheet or such,
suggesting the previously required cache tier with EC pools.

Christian

>What is your goal by implementing this cache? If the
> answer is to utilize extra space on the nvmes, then just remove it and say
> thank you. The better use of nvmes in that case are as a part of the
> bluestore stack and give your osds larger DB partitions. Keeping your
> metadata pool on nvmes is still a good idea.
> 
> On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong <shaw@xxxxxxxx> wrote:
> 
> > Dear all,
> >
> > We just set up a Ceph cluster, running the latest stable release Ceph
> > v12.2.0 (Luminous):
> > # ceph --version
> > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> > (rc)
> >
> > The goal is to serve Ceph filesystem, for which we created 3 pools:
> > # ceph osd lspools
> > 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> > where
> > * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> > * cephfs_metadata is the metadata pool
> > * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> > cache-mode is writeback.
> >
> > Everything had worked fine, until today when we tried to copy a 1.3TB file
> > to the CephFS.  We got the "No space left on device" error!
> >
> > 'ceph -s' says some OSDs are full:
> > # ceph -s
> >   cluster:
> >     id:     e18516bf-39cb-4670-9f13-88ccb7d19769
> >     health: HEALTH_ERR
> >             full flag(s) set
> >             1 full osd(s)
> >             1 pools have many more objects per pg than average
> >
> >   services:
> >     mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> >     mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> >     mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> >     osd: 39 osds: 39 up, 39 in
> >          flags full
> >
> >   data:
> >     pools:   3 pools, 2176 pgs
> >     objects: 347k objects, 1381 GB
> >     usage:   2847 GB used, 262 TB / 265 TB avail
> >     pgs:     2176 active+clean
> >
> >   io:
> >     client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
> >
> > And indeed the cache pool is full:
> > # rados df
> > POOL_NAME       USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> > DEGRADED RD_OPS   RD
> >     WR_OPS  WR
> > cephfs_cache    1381G  355385      0 710770                  0       0
> >     0 10004954 15
> > 22G 1398063  1611G
> > cephfs_data         0       0      0      0                  0       0
> >     0        0
> >   0       0      0
> > cephfs_metadata 8515k      24      0     72                  0       0
> >     0        3  3
> > 072    3953 10541k
> >
> > total_objects    355409
> > total_used       2847G
> > total_avail      262T
> > total_space      265T
> >
> > However, the data pool is completely empty! So it seems that data has only
> > been written to the cache pool, but not written back to the data pool.
> >
> > I am really at a loss whether this is due to a setup error on my part, or
> > a Luminous bug. Could anyone shed some light on this? Please let me know if
> > you need any further info.
> >
> > Best,
> > Shaw
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com