I thought we could do the same in run-time for vps'es only. Sage? On Mon, Aug 11, 2014 at 11:47 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: > > > On 11/08/2014 19:34, Yuri Weinstein wrote: >> Here is what we have in vps.yaml now: >> >> overrides: >> ceph: >> conf: >> global: >> osd heartbeat grace: 40 >> >> What do we want to add? > > I think the idle_timeout values at > > https://github.com/ceph/ceph-qa-suite/pull/79/files > > >> >> ~ >> >> On Mon, Aug 11, 2014 at 10:13 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>> On Mon, 11 Aug 2014, Yehuda Sadeh wrote: >>>> Yeah, looking at these logs, it really seem that it's just that things >>>> are going slow on these machines and it's hitting timeouts. The fix is >>>> ok with me, although I'd rather have it adjusted per machine type >>>> (somehow). >>> >>> There is a vps.yaml that bumps up another timeout, so we could put it >>> there. Right now it lives on the teuthology machine >>> (~teuthworker/vps.yaml I think?), but perhaps we should stick it in >>> ceph-qa-suite.git somewhere ... >>> >>> sage >>> >>>> >>>> Yehuda >>>> >>>> On Mon, Aug 11, 2014 at 9:21 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>>> Hi Yehuda, >>>>> >>>>> It looks like increasing the rgw idle timeout makes the problem go away ( https://github.com/ceph/ceph-qa-suite/pull/79 and http://tracker.ceph.com/issues/8988 ). It previously was 300 sec which looks like a large value already. Does this fix / workaround make sense to you ? >>>>> >>>>> Cheers >>>>> >>>>> On 10/08/2014 10:46, Loic Dachary wrote: >>>>>> Hi Yehuda, >>>>>> >>>>>> In the past few months the swift tests failed randomly and I was unfortunately unable to figure out why. Here are a few examples: >>>>>> >>>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406944 >>>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406941 >>>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406946 >>>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406947 >>>>>> >>>>>> and it has happened on every upgrade test run since I can remember. I fail to see a pattern and cannot figure out what the real problem is. It would be really great if you could take a look. Even a hunch or a tip would be greatly appreciated :-) >>>>>> >>>>>> You can find more context in >>>>>> >>>>>> http://tracker.ceph.com/issues/8988 >>>>>> http://tracker.ceph.com/issues/8016 >>>>>> http://tracker.ceph.com/issues/7799 >>>>>> >>>>>> and discussions at >>>>>> >>>>>> http://www.spinics.net/lists/ceph-devel/msg19933.html >>>>>> >>>>>> Cheers >>>>>> >>>>> >>>>> -- >>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html