Re: Troubleshooting an erasure coded pool with a cache tier

Gregory Farnum <greg@xxxxxxxxxxx> · Sat, 8 Nov 2014 15:43:31 -0800

On Sat, Nov 8, 2014 at 3:24 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>
>
> On 09/11/2014 00:03, Gregory Farnum wrote:
>> It's all about the disk accesses. What's the slow part when you dump historic and in-progress ops?
>
> This is what I see on g1 (6% iowait)

Yeah, you're going to need to do some data collation (at least in your
head). If it's consistent that one node has way more ops and a higher
iowit than everybody else, it sounds to me like you've found your
answer. If it doesn't, look at the historic ops and see if there are
any patterns.
I confess I don't recall what the reported status is when an op is
waiting for promotes to occur; you'll probably want to check that out
too and see how long that stage is taking.

<snip>

> Also when I ceph -w I see a new pgmap is created every second which is also not a good sign.

That's normal, unless you've adjusted your config options to avoid it.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com