Re: "Testing Ceph: Pains & Pleasures" recording now online

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 27 Feb 2019 14:19:48 -0800



On Wed, Feb 27, 2019 at 1:55 PM Ken Dreyer <kdreyer@xxxxxxxxxx> wrote:
> At the end of the video someone brought up Jepsen. What did you think about it?

Jepsen (http://jepsen.io) is great! I haven't read every analysis, but
the Inktank team thoroughly enjoyed when we ran across it in late 2013
and got to see the results on Mongo, Zookeeper, etc. I've never worked
with it directly or personally though because teuthology was already
well-established by then and covered the functionality Jepsen is good
at.
The main advantage Jepsen has is that its fault injection and testing
models are fairly white box — I presume there's some API he grafts on
top of the libraries from any given distributed system, but the tests
run across them are pretty much the same. That's good for portability
of testing, but not so good for inducing specific states and failure
modes within the distributed system as teuthology does. For instance,
we have the RadosModel class for sending off vast numbers of specific
operations to the cluster and validating we get back the expected
results, but we have the advantage of being able to trigger both
random and specific perturbations of the OSD processes themselves, of
triggering failures on specific input patterns, etc whereas Jepsen can
only manipulate the input and the network.
-Greg