Re: Preconditioning an RBD image

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Thu, 6 Apr 2017 17:04:24 +0200

On 03/25/17 23:01, Nick Fisk wrote:
>
>> I think I owe you another graph later when I put all my VMs on there
>> (probably finally fixed my rbd snapshot hanging VM issue ...worked around it
>> by disabling exclusive-lock,object-map,fast-diff). The bandwidth hungry ones
>> (which hung the most often) were moved shortly after the bcache change,
>> and it's hard to explain how it affects the graphs... easier to see with iostat
>> while changing it and having a mix of cache and not than ganglia afterwards.
> Please do, I can't resist a nice graph. What I would be really interested in is answers to these questions, if you can:
>
> 1. Has your per disk bandwidth gone up, due to removing random writes. Ie. I struggle to get more than about 50MB/s writes per disk due to extra random IO per request
> 2. Any feeling on how it helps with dentry/inode lookups. As mentioned above, I'm using 8TB disks and cold data has extra penalty for reads/writes as it has to lookup the FS data first
> 3. I assume with 4.9 kernel you don't have the bcache fix which allows partitions. What method are you using to create OSDs?
> 4. As mentioned above any stats around percentage of MB/s that is hitting your cache device vs journal (assuming journal is 100% of IO). This is to calculate extra wear
>
> Thanks,
> Nick

So it's graph time...

Here's basically what you saw before, but I made it stacked (so 900 on
the %util means like 18/27 of the disks in the whole cluster are at avg
50% in the sample period for that one pixel width of the graph) (remove
gtype=stack and it won't be stacked, or
http://www.brockmann-consult.de/ganglia/?c=ceph and find the aggregate
report form and fill it out yourself ... I manually added date (cs and
ce) copied from another url since that form doesn't have it, and only
makes last x time periods. You can also find more metrics in the drop
downs on that page. sda,sdb have always been the SSDs, disk metrics are
30s averages from iostat)

With no bcache until a bit at the end, plus some load from migrating to
bcache possibly in there (didn't record dates on that).

%util -
http://www.brockmann-consult.de/ganglia/graph.php?hreg[]=ceph.*&mreg[]=sd[c-z]_util&glegend=show&aggregate=1&_=1491205396888&cs=11%2F1%2F2016+21%3A18&ce=12%2F15%2F2016+4%3A21&z=xlarge&gtype=stack&x=1000
await -
http://www.brockmann-consult.de/ganglia/graph.php?hreg[]=ceph.*&mreg[]=sd[c-z]_await&glegend=show&aggregate=1&_=1491205396888&cs=11%2F1%2F2016+21%3A18&ce=12%2F15%2F2016+4%3A21&z=xlarge&gtype=stack&x=1000
wMBps -
http://www.brockmann-consult.de/ganglia/graph.php?hreg[]=ceph.*&mreg[]=sd[c-z]_wMBps&glegend=show&aggregate=1&_=1491205396888&cs=11%2F1%2F2016+21%3A18&ce=12%2F15%2F2016+4%3A21&z=xlarge&gtype=stack&x=300

And here is since most VMs were on ceph (more than the before graphs),
with some osd-reweight-by-utilization started since a few days ago (but
scrub disabled during that) making the last part look higher. And the
last VMs were moved today, also seen on the graph, plus some extra
backup load some time later.

%util -
http://www.brockmann-consult.de/ganglia/graph.php?hreg[]=ceph.*&mreg[]=sd[c-z]_util&glegend=show&aggregate=1&_=1491205396888&cs=3%2F24%2F2017+23%3A3&z=xlarge&gtype=stack&x=1000
await -
http://www.brockmann-consult.de/ganglia/graph.php?hreg[]=ceph.*&mreg[]=sd[c-z]_await&glegend=show&aggregate=1&_=1491205396888&cs=3%2F24%2F2017+23%3A3&z=xlarge&gtype=stack&x=1000
wMBps -
http://www.brockmann-consult.de/ganglia/graph.php?hreg[]=ceph.*&mreg[]=sd[c-z]_wMBps&glegend=show&aggregate=1&_=1491205396888&cs=3%2F24%2F2017+23%3A3&z=xlarge&gtype=stack&x=300

Looking at the wMBps graph, you can see the cluster doesn't really have
that high of a load on average, only in bursts, but seeing that the
before and after are similar load means the other graphs can be at least
somewhat comparable.

I think the %util graph speaks for itself, but I don't know how to show
you what it does in VMs. I figure it will smooth out the performance at
times when lots of requests happen that hdds are bad at but ssds are
good at (snap trimming, directory splitting, etc.). Lots of issues I
find are clearly seen in %util.

Or both time ranges together in the main reports page:

http://www.brockmann-consult.de/ganglia/?r=year&cs=10%2F21%2F2016+20%3A33&ce=4%2F7%2F2017+7%3A6&c=ceph&h=&tab=m&vn=&hide-hf=false&m=load_one&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name

And be sure to share some of your own results. :)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com