Re: ceph/rados performace sync vs async

Janne Johansson <icepic.dz@xxxxxxxxx> · Sat, 18 Jul 2020 08:43:37 +0200

Den lör 18 juli 2020 kl 02:18 skrev <DHilsbos@xxxxxxxxxxxxxx>:

> Daniel;
> As I said, I don't actually KNOW most of this.
>

Seems correct in my view though.

> As such, what I laid out was conceptual.
> Ceph would need to be implemented to perform these operations in parallel,
> or not.  Conceptually, those areas where operations can be parallelized,
> making them parallel would improve wall clock performance in 80% - 90% of
> cases, thus making this configurable wouldn't make sense.
>
> That said, I don't know which route the developers went.
> All I know is that the client transfers each chunk to the master for its
> PG, and the master sends it on to the replicas.
> I suspect that replicas must acknowledge the chunk before the master
> finishes the synchronous operation.
> I suspect that all replicas are transferred (from the master) in parallel.
>

It probably is parallel, but if you can max out your network with the
traffic, then there will still be waiting from the master before all
replicas get their data. In this case it seems it was repl=2, so that is
not an issue, but if you had 1GE and repl > 2 I'm sure you'd notice how
network would make those transfers feel very non-parallel. ;)

> Given the maturity of Ceph, I suspect this has already been done, unless
> the developers ran into a significant issue, but I don't know.
>

One of the things to consider is that if you do one single stream, you are
basically not testing what a cluster can do, but only what the absolute
smallest setup does. If you have 100s of consumers talking to your cluster,
they can't all just fire off a single copy and then immediately proceed and
send the next while all the background traffic still needs to happen, times
100.
Well, you can, but you will still see the lower "real" bandwidth per-client
at that point.

Going async is the same as writing to RAM buffers or the small WAL/journal
on a faster drive and so on. It helps with a short temporary spike if you
didn't have anything running at the same time, but it doesn't really
reflect the true capacity in the cluster for a single node, nor will a
single test show the total capacity of a ceph cluster either, since that
will be the sum of all (lower) single-client speeds.
It will only show what perf you can get until your RAM buffer or
WAL/journal runs out of capacity so you aren't really benching the correct
thing.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx