On Wed, Feb 27, 2019 at 1:55 PM Ken Dreyer <kdreyer@xxxxxxxxxx> wrote: > At the end of the video someone brought up Jepsen. What did you think about it? Jepsen (http://jepsen.io) is great! I haven't read every analysis, but the Inktank team thoroughly enjoyed when we ran across it in late 2013 and got to see the results on Mongo, Zookeeper, etc. I've never worked with it directly or personally though because teuthology was already well-established by then and covered the functionality Jepsen is good at. The main advantage Jepsen has is that its fault injection and testing models are fairly white box — I presume there's some API he grafts on top of the libraries from any given distributed system, but the tests run across them are pretty much the same. That's good for portability of testing, but not so good for inducing specific states and failure modes within the distributed system as teuthology does. For instance, we have the RadosModel class for sending off vast numbers of specific operations to the cluster and validating we get back the expected results, but we have the advantage of being able to trigger both random and specific perturbations of the OSD processes themselves, of triggering failures on specific input patterns, etc whereas Jepsen can only manipulate the input and the network. -Greg