Re: Cache tier weirdness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Christian Balzer
> Sent: 26 February 2016 09:07
> To: ceph-users@xxxxxxxxxxxxxx
> Subject:  Cache tier weirdness
> 
> 
> Hello,
> 
> still my test cluster with 0.94.6.
> It's a bit fuzzy, but I don't think I saw this with Firefly, but then
again that is
> totally broken when it comes to cache tiers (switching between writeback
> and forward mode).
> 
> goat is a cache pool for rbd:
> ---
> # ceph osd pool ls detail
> pool 2 'rbd' replicated size 3 min_size 1 crush_ruleset 2 object_hash
rjenkins
> pg_num 512 pgp_num 512 last_change 11729 lfor 11662 flags hashpspool
> tiers 9 read_tier 9 write_tier 9 stripe_width 0
> 
> pool 9 'goat' replicated size 1 min_size 1 crush_ruleset 3 object_hash
rjenkins
> pg_num 128 pgp_num 128 last_change 11730 flags
> hashpspool,incomplete_clones tier_of 2 cache_mode writeback
> target_bytes 524288000 hit_set bloom{false_positive_probability: 0.05,
> target_size: 0, seed: 0} 3600s x1 stripe_width 0
> ---
> 
> Initial state is this:
> ---
> # rados df
> pool name                 KB      objects       clones     degraded
unfound           rd
> rd KB           wr        wr KB
> goat                      34          429            0            0
0         1051      4182046       145803
> 10617422
> rbd                164080702        40747            0            0
0       419664     71142697
> 4430922    531299267
>   total used       599461060        41176
>   total avail     5301740284
>   total space     5940328912
> ---
> 
> First we put some data in there with
> "rados -p rbd  bench 20 write -t 32 --no-cleanup"
> which easily exceeds the target bytes of 512MB and gives us:
> ---
> pool name                 KB      objects
> goat                  356386          372
> ---
> 
> For starters, that's not the number I would have expected given how this
> configured:
> cache_target_dirty_ratio: 0.5
> cache_target_full_ratio: 0.9
> 
> Lets ignore (but not forget) that discrepancy for now.

One of the things I have noticed is that whilst the target_max_bytes is set
per pool, its actually acted on per PG. So each PG will flush/evict based on
its share of the pool capacity. Depending on where data resides, PG's will
normally have differing amount of data stored which leads to inconsistent
cache pool flush/eviction limits. I believe there is also a "slop" factor in
the cache code so that the caching agents are not always working on hard
limits. I think with artificially small cache sizes, both of these cause
adverse effects.

> 
> After doing a read with "rados -p rbd  bench 20 rand -t 32" to my utter
> bafflement I get:
> ---
> pool name                 KB      objects
> goat                    8226          199
> ---
> 
> And after a second read it's all gone, looking at the network traffic it
all
> originated from the base pool nodes and got relayed through the node
> hosting the cache pool:
> ---
> pool name                 KB      objects
> goat                      34          191
> ---
> 
> I verified that the actual objects are on the base pool with 4MB each,
while
> their "copies" are on the cache pool OSDs with zero length.
> 
> Can anybody unbaffle me? ^o^

Afraid not, but I will try my best. In Hammer, you still don't have proxy
writes, so that's why the write test fills up your cache tier. Proxy reads
will mean that if the object is not in cache it will be retrieved from the
base tier and only promoted if it gets sequential hits across the hitsets
defined by min_recency. 

I believe that the hitsets are stored as hidden objects in the pool. Hitsets
also only get created when you do IO, the agent just sleeps otherwise. I'm
wondering if the hitset creation is causing the cached bench_data objects to
be evicted? Ie the eviction code is assuming a hitset object is bigger than
it actually is?

> 
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux