Hi Lars!
Replies inline below
On 9/9/19 5:59 AM, Lars Marowsky-Bree wrote:
On 2019-08-20T08:53:50, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Hi Mark,
sorry for the slow response. Got sidetracked into business travel ;-)
No worries, I've got too many plates in the air right now so the delay
is perfectly fine. :)
I looked over https://github.com/markhpc/hsbench features to consider
how it compares to the fio S3/Swift/DAV backend.
fio's so far can only target one bucket (it can only be pointed at one
http_host, but in theory, that could be a variable that could be filled
via variables or different jobs.
That also limits the number of endpoints - it doesn't RR across multiple
endpoints for the same job, but fio would easily support kicking off the
same job for multiple endpoints concurrently via one control file, so
that'd likely achieve something similar?
I think you could definitely craft something for fio that would give you
a similar effect if not work in exactly the same way.
The whole latency reporting and test parametrization is, I think, an
advantage of fio.
On the hsbench side, I did try to craft the latency reporting piece so
that you can ask for arbitrary percentiles. Right now we just report
min/50%/99%/max, but that could be fleshed out like fio's reporting.
One thing that I'm not super fond of in fio is the way interval logging
is done, but that's probably somewhat minor in the grand scheme of things.
With one exception - it doesn't really currently support mixing object
sizes (block sizes) within one job easily given how I hacked that into
fio. I think it's probably safe, since I assume fio only ever reads
blocks/objects back with the same size it wrote them, but it's kinda
awkward ;-)
hsbench is quite simple in this regard too. It doesn't do any kind of
special mapping of object sizes, or even do anything like fio's random
map so that if you are randomly reading objects back it doesn't hit the
same one. In hsbench we certainly could create some kind of object map
(and even store and retrieve that map in S3 for future read tests), but
currently hsbench is really targeted at put/get/list/delete for
homogeneous data sets as well.
What fio also doesn't do is manage the buckets itself; it doesn't
support deletion/creation, it only does PUT/GET/DELETE for objects
within existing buckets. (Given that bucket creation could potentially
require quite a number of options to be passed, that was sort-of
intentional - I expected buckets to be provisioned before benchmarking
the cluster.)
It turns out after talking to our RGW guys that a really important test
here is bucket list throughput. I added that in hsbench 0.2 and Matt
Benjamin has a PR in the works for testing unordered bucket listings as
well. This is especially important right now because with the way
sharding works in RGW, you get much higher parallelism in one bucket if
you shard it, but you hurt bucket list performance with higher
read-amplification.
It can however delete all objects (just "trim" everything afterwards -
https://github.com/axboe/fio/blob/master/examples/http-s3.fio
And it doesn't support multipart uploads. But I think that's the same
for hsbench so far.
Yeah, no special provisions for multipart yet.
Of course, hsbench gets to benefit from the aws Go bindings. There
weren't any lightweight S3 libraries for C, so fio ended up with it's
tiny little rewrite of one. (I think swearing might have been involved
around getting the authentication right :-D)
I believe it! hsbench is mostly just a glorified wrapper around the aws
go bindings with some timing and statistics aggregation thrown on top.
The whole thing is like 1K lines of code. You have more control if you
own the S3 library too, but I think for hsbench the aws bindings appear
to be good enough so far.
I wonder if you've been able to compare both of them yet; I'd be curious
which one is more lightweight and can get higher per client performance?
Honestly I haven't had time to try yet. I'm trying to be a little
clever about concurrency and am using atomics in go to keep away from
the mutex/locking overhead that I suspect channels would bring, but I
wouldn't be terribly surprised if the fio/s3 implementation could be
faster if it's not doing anything silly. I've noticed that hsbench can
still use a ton of CPU at high throughput rates. 16K gets/s can consume
6+ xeon E5-2650V3 cores (pretty old these days though). I suspect that
a big part of this is on the networking side and how connections are
recycled. Probably need to take a bit more time to look at this in detail.
In any event, I haven't been totally idle. Once I got hsbench working
well enough to run tests, we started collecting information from our lab
that lead to a bunch of work improving RGW. You can see some of the
test results here:
https://docs.google.com/spreadsheets/d/1q8MZJo9rp_3Kvs8ARaIxXf_TZbnnB0dRmrq8bCw2uDM/edit?usp=sharing
https://docs.google.com/spreadsheets/d/17SiwtVWy3jJdeXocFXrcPFvn8F5x1m8525BwjTevvME/edit?usp=sharing
And some of the PRs to make RGW faster here:
https://github.com/ceph/ceph/pull/29980(cls/rgw: [WIP] Make
rgw_bucket_list encode bls rather than dir entries, markhpc)
https://github.com/ceph/ceph/pull/29943(rgw/rgw_op: Remove get_val from
hotpath via legacy options, markhpc)
https://github.com/ceph/ceph/pull/29894(rgw/rgw_reshard: Don't dump
RGWBucketReshard JSON in process_single_logshard, markhpc)
https://github.com/ceph/ceph/pull/29852(rgw: move bucket reshard checks
out of write path, cbodley)
Casey also recently posted a proposal to dev@xxxxxxx for allowing writes
during resharding which would fix some of those stalls seen in the
graphs. Matt also has a good idea regarding changing the shard hashing
behavior to preserve ordering so that bucket listing times should remain
fast even with large numbers of shards.
Also, if you look at those graphs you'll see a lot of fluctuation on the
put side even after those PRs are applied. Putting the DB/WAL on
ramdisk (without bluefs) stabilizes throughput pretty significantly, so
it looks like at least on the write path side hsbench is fast enough to
showcase that putting rocksdb on optane or a pmem devices might
stabilize write throughput rates and potentially improve overall small
object write performance. Once the new community performance nodes are
setup that's one of the first tests I intend to run on them.
Now, my desire for having this all be in the same tool across all
benchmarks if feasible, but seeing the benefits of using a language for
which an Amazon SDK is available ...
https://medium.com/learning-the-go-programming-language/calling-go-functions-from-other-languages-4c7d8bcc69bf
A fio plugin wrapper around hsbench's S3 module ought to be possible?
I was actually excited to hear about the fio effort because it's a
second option to compare results against! I would be a little sad to
lose that if you adopted some of the hsbench code for the backend, but
on the other hand I understand why you might want to do it (certainly
the bindings are very convenient!).
Mark
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx