Re: Troubleshooting an erasure coded pool with a cache tier

Laurent GUERBY <laurent@xxxxxxxxxx> · Tue, 18 Nov 2014 08:38:45 +0100

Le Tuesday 18 November 2014 à 10:11 +0900, Christian Balzer a écrit :
> Hello,
> 
> On Mon, 17 Nov 2014 17:45:54 +0100 Laurent GUERBY wrote:
> 
> > Hi,
> > 
> > Just a follow-up on this issue, we're probably hitting:
> > 
> > http://tracker.ceph.com/issues/9285
> > 
> I wonder how much pressure was on that cache tier, though. 
> If I understand the bug report correctly, this should only happen if
> some object gets evicted before it was fully replicated.
> So I suppose if the cache pool is sized "correctly" for the working set in
> question (which of course is a bugger given a 4MB granularity), things
> should work. Until you hit the threshold and they don't anymore...

Hi,

Same experience a 10 GB size=3 min=2 cache on a 1 TB 4+1 ec pool
and a 500 GB size=3 min=2 cache on 8 TB 3+1 ec pool (5 hosts, 9
rotational disks total).

We also noticed that well after we deleted the cache and ec pool we
still had frequent slow write until we restarted some of the "slow
write" OSD. Now the slow write are very rare a short episode ~ ten
seconds every few hours according to logs.

Let's hope the ceph developpers will fix this bug so that people can
give more testing to erasure coding, I have added a comment on the
ticket.

Sincerely,

Laurent

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com