Re: Deterministic thrashing

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 7 Apr 2014 10:16:30 -0700



On Mon, Apr 7, 2014 at 10:13 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>
>
> On 07/04/2014 18:55, Gregory Farnum wrote:
>> This would be really nice but there are unfortunately even more
>> hiccups than you've noted here:
>> 1) Thrashing is both time and disk access sensitive, and hardware differs
>> 2) The teuthology thrashing is triggered largely based on PG state
>> events (eg, "all PGs are clean, so restart an OSD")
>> 3) The actual failures tend to involve a combination of PG state and
>> inbound client operations, and I can't think of any realistic way to
>> coordinate those.
>>
>> Those problems look technically insurmountable to me, but maybe I'm
>> missing something?
>
> There is no easy way to use the logs / events to significantly reduce the randomness of the workload ? I honestly have no clue ;-)

I don't think so, no. :( We'd have to somehow order every event in the
system while not losing any of the system race conditions that were
previously triggered!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html