On 11/08/2014 19:34, Yuri Weinstein wrote: > Here is what we have in vps.yaml now: > > overrides: > ceph: > conf: > global: > osd heartbeat grace: 40 > > What do we want to add? I think the idle_timeout values at https://github.com/ceph/ceph-qa-suite/pull/79/files > > ~ > > On Mon, Aug 11, 2014 at 10:13 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> On Mon, 11 Aug 2014, Yehuda Sadeh wrote: >>> Yeah, looking at these logs, it really seem that it's just that things >>> are going slow on these machines and it's hitting timeouts. The fix is >>> ok with me, although I'd rather have it adjusted per machine type >>> (somehow). >> >> There is a vps.yaml that bumps up another timeout, so we could put it >> there. Right now it lives on the teuthology machine >> (~teuthworker/vps.yaml I think?), but perhaps we should stick it in >> ceph-qa-suite.git somewhere ... >> >> sage >> >>> >>> Yehuda >>> >>> On Mon, Aug 11, 2014 at 9:21 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>> Hi Yehuda, >>>> >>>> It looks like increasing the rgw idle timeout makes the problem go away ( https://github.com/ceph/ceph-qa-suite/pull/79 and http://tracker.ceph.com/issues/8988 ). It previously was 300 sec which looks like a large value already. Does this fix / workaround make sense to you ? >>>> >>>> Cheers >>>> >>>> On 10/08/2014 10:46, Loic Dachary wrote: >>>>> Hi Yehuda, >>>>> >>>>> In the past few months the swift tests failed randomly and I was unfortunately unable to figure out why. Here are a few examples: >>>>> >>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406944 >>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406941 >>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406946 >>>>> http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406947 >>>>> >>>>> and it has happened on every upgrade test run since I can remember. I fail to see a pattern and cannot figure out what the real problem is. It would be really great if you could take a look. Even a hunch or a tip would be greatly appreciated :-) >>>>> >>>>> You can find more context in >>>>> >>>>> http://tracker.ceph.com/issues/8988 >>>>> http://tracker.ceph.com/issues/8016 >>>>> http://tracker.ceph.com/issues/7799 >>>>> >>>>> and discussions at >>>>> >>>>> http://www.spinics.net/lists/ceph-devel/msg19933.html >>>>> >>>>> Cheers >>>>> >>>> >>>> -- >>>> Lo?c Dachary, Artisan Logiciel Libre >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature