On Mon, Apr 7, 2014 at 10:13 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: > > > On 07/04/2014 18:55, Gregory Farnum wrote: >> This would be really nice but there are unfortunately even more >> hiccups than you've noted here: >> 1) Thrashing is both time and disk access sensitive, and hardware differs >> 2) The teuthology thrashing is triggered largely based on PG state >> events (eg, "all PGs are clean, so restart an OSD") >> 3) The actual failures tend to involve a combination of PG state and >> inbound client operations, and I can't think of any realistic way to >> coordinate those. >> >> Those problems look technically insurmountable to me, but maybe I'm >> missing something? > > There is no easy way to use the logs / events to significantly reduce the randomness of the workload ? I honestly have no clue ;-) I don't think so, no. :( We'd have to somehow order every event in the system while not losing any of the system race conditions that were previously triggered! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html