Ceph cache-pool overflow

Квапил, Андрей <kvaps@xxxxxxxxxxx> · Fri, 28 Aug 2015 13:05:12 +0300

Hi lists! I have the interesting problem:

I have ceph-cluster consisting of 3 nodes with cache pool.
Cache pool consists of PLEXTOR PX-AG128M6e drives, 2 for each node, in 
the sum 6 pieces of 128GB

And has the following parameters when cluster is create:

size: 2
min_size: 1
crash_replay_interval: 0
pg_num: 512
pgp_num: 512
hit_set_type: bloom
hit_set_period: 3600
hit_set_count: 1
target_max_objects: 0
target_max_bytes: 300647710720
cache_target_dirty_ratio: 0.4
cache_target_full_ratio: 0.8
cache_min_flush_age: 0
cache_min_evict_age: 0

This configuration works fine for quite some time, until it began to be 
used more actively in production.
I must to say that we store virtual machines drives in ceph, and each 
night snapshots are automatically created for all virtual machines 
contained in Ceph

Everything was fine, but once during night snapshots, the following 
happened:
All virtual machines freeze. Ceph I/O has not worked.
Ceph reported that some PGs were filled:

# ceph -s
     health HEALTH_ERR
            2 pgs backfill_toofull
            2 pgs stuck unclean
            2 requests are blocked > 32 sec
            recovery 572/475366 objects misplaced (0.120%)
            1 full osd(s)
            3 near full osd(s)
     monmap e1: 3 mons at 
{HV-01=10.10.101.11:6789/0,HV-02=10.10.101.12:6789/0,HV-03=10.10.101.13:6789/0}
            election epoch 150, quorum 0,1,2 HV-01,HV-02,HV-03
     osdmap e1065: 15 osds: 15 up, 15 in; 2 remapped pgs
            flags full
      pgmap v1418948: 1024 pgs, 2 pools, 857 GB data, 231 kobjects
            1832 GB used, 49019 GB / 50851 GB avail
            572/475366 objects misplaced (0.120%)
                1022 active+clean
                   2 active+remapped+backfill_toofull

I urgently needed to solve the problem, so I set the size: 1 for 
ssd-cache pool and evicted most of the objects from it to the the main 
pool, then back set size: 2

After that, I began to study why this could happen.
It was strange, but I thought that maybe I set too great value for 
target_max_bytes: 300647710720. I changed it to target_max_bytes: 
200000000000

In this state, it continued to work about two weeks.
Today, during night snapshot, the situation repeated itself, this time 
was not the filled PGs, but one of the OSD was filled:

# ceph -s
    cluster 8a2e8300-9d27-4856-99ca-05d9a9a9009c
     health HEALTH_ERR
            1 full osd(s)
            3 near full osd(s)
     monmap e1: 3 mons at 
{HV-01=10.10.101.11:6789/0,HV-02=10.10.101.12:6789/0,HV-03=10.10.101.13:6789/0}
            election epoch 156, quorum 0,1,2 HV-01,HV-02,HV-03
     osdmap e2185: 15 osds: 15 up, 15 in
            flags full
      pgmap v2070259: 1024 pgs, 2 pools, 882 GB data, 255 kobjects
            2028 GB used, 48823 GB / 50851 GB avail
                1024 active+clean

In the zabbix graph can see that some of the OSDs in ssd-cache pool is 
really filled stronger than the other. Why is this happening, I do not 
understand, and, how in this case to count correct value for 
target_max_bytes?

Now, I change cache_target_full_ratio: 0.8 to cache_target_full_ratio: 
0.6 for ssd-cahe pool, just in case.
But like the root to solve this problem and to use the resources of SSDs 
to maximum?

Please write if you have any ideas on this subject.

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com