Hi Loic, We rebased our teuthology/ceph-qa-suite and retried the test toward LRC on current master. However, we unfortunately got the same result as before (timeout error). [test conditions] Target : Ceph-9.0.0-971-gd49d816 https://github.com/kawaguchi-s/teuthology https://github.com/kawaguchi-s/ceph-qa-suite/tree/wip-10886-lrc [teuthology log] 2015-05-25 10:18:23 # start 2015-05-25 11:59:52,106.106 INFO:teuthology.orchestra.run.RX35-1:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status -- format=json-pretty' 2015-05-25 11:59:52,564.564 INFO:tasks.ceph.ceph_manager:no progress seen, keeping timeout for now 2015-05-25 11:59:52,565.565 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 635, in wrapper return func(self) File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 668, in do_thrash timeout=self.config.get('timeout') File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1569, in wait_for_recovery 'failed to recover before timeout expired' AssertionError: failed to recover before timeout expired Traceback (most recent call last): File "/root/work/teuthology/virtualenv/lib/python2.7/site-packages/gevent/greenlet.py", line 390, in run result = self._run(*self.args, **self.kwargs) File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 635, in wrapper return func(self) File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 668, in do_thrash timeout=self.config.get('timeout') File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1569, in wait_for_recovery 'failed to recover before timeout expired' AssertionError: failed to recover before timeout expired <Greenlet at 0x36cacd0: <bound method Thrasher.do_thrash of <tasks.ceph_manager.Thrasher instance at 0x36df3f8>>> failed with AssertionError Best regards, Takeshi Miyamae -----Original Message----- From: Loic Dachary [mailto:loic@xxxxxxxxxxx] Sent: Thursday, May 21, 2015 6:38 PM To: Miyamae, Takeshi/宮前 剛; Ceph Development Cc: Kawaguchi, Shotaro/川口 翔太朗; Imai, Hiroki/今井 宏樹; Nakao, Takanori/中尾 鷹詔; Shiozawa, Kensuke/塩沢 賢輔 Subject: Re: teuthology timeout error Hi, [sorry the previous mail was sent by accident, here is the full mail] On 21/05/2015 10:32, Miyamae, Takeshi wrote: > Hi Loic, > >> Could you please share the teuthology/ceph-qa-suite repository you >> are using to run these tests so I can try to reproduce / diagnose the problem ? > > https://github.com/kawaguchi-s/teuthology/tree/wip-10886 > https://github.com/kawaguchi-s/ceph-qa-suite/tree/wip-10886 > When compared against master they show differences that indicate it would be good to rebase: https://github.com/ceph/teuthology/compare/master...kawaguchi-s:wip-10886 https://github.com/ceph/ceph-qa-suite/compare/master...kawaguchi-s:wip-10886 I think the teuthology commit on top of wip-10886 is a mistake https://github.com/kawaguchi-s/teuthology/commit/348e54931f89c9b0ae7a84eb931576f8414017b5 do you really need to modify teuthology ? It should just be necessary to use the latest master branch. It looks like the https://github.com/kawaguchi-s/ceph-qa-suite/commit/f2e3ca5d12ceef742eae2a9cf4057c436e9040c3 commit in your ceph-qa-suite is not what you intended. However https://github.com/kawaguchi-s/ceph-qa-suite/commit/4b39d6d4862f9091a849d224e880795be406815d https://github.com/kawaguchi-s/ceph-qa-suite/commit/d16b4b058ae118931928541a2c8acd68f9703a44 look ok :-) Instead of naming the test 4nodes16osds3mons1client.yaml it would be better to use the same kind of naming you see at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados/thrash-erasure-code/workloads. That is a file name made of the distinctive parameters for the shec plugin (the parameters that are the default can be omited). Cheers > Here are our teuthology/ceph-qa-suite repositories. Thanks in advance. > > Best regards, > Takeshi Miyamae > > -----Original Message----- > From: Loic Dachary [mailto:loic@xxxxxxxxxxx] > Sent: Wednesday, May 20, 2015 4:49 PM > To: Miyamae, Takeshi/宮前 剛; Ceph Development > Cc: Kawaguchi, Shotaro/川口 翔太朗; Imai, Hiroki/今井 宏樹; Nakao, Takanori/中尾 > 鷹詔; Shiozawa, Kensuke/塩沢 賢輔 > Subject: Re: teuthology timeout error > > Hi, > > On 20/05/2015 04:20, Miyamae, Takeshi wrote: >> Hi Loic, >> >> When we fixed our own issue and restarted teuthology, > > Great ! > >> we encountered another issue (timeout error) which occurs in case of LRC as well. >> Do you have any information about that ? > > Could you please share the teuthology/ceph-qa-suite repository you are using to run these tests so I can try to reproduce / diagnose the problem ? > > Thanks > >> >> [error messages (in case of LRC pool)] >> >> 2015-04-28 12:38:54,128.128 INFO:teuthology.orchestra.run.RX35-1:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty' >> 2015-04-28 12:38:54,516.516 INFO:tasks.ceph.ceph_manager:no progress >> seen, keeping timeout for now >> 2015-04-28 12:38:54,516.516 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): >> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 632, in wrapper >> return func(self) >> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 665, in do_thrash >> timeout=self.config.get('timeout') >> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1566, in wait_for_recovery >> 'failed to recover before timeout expired' >> AssertionError: failed to recover before timeout expired >> >> Traceback (most recent call last): >> File "/root/work/teuthology/virtualenv/lib/python2.7/site-packages/gevent/greenlet.py", line 390, in run >> result = self._run(*self.args, **self.kwargs) >> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 632, in wrapper >> return func(self) >> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 665, in do_thrash >> timeout=self.config.get('timeout') >> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1566, in wait_for_recovery >> 'failed to recover before timeout expired' >> AssertionError: failed to recover before timeout expired <Greenlet at >> 0x2a7d550: <bound method Thrasher.do_thrash of >> <tasks.ceph_manager.Thrasher instance at 0x2bd12d8>>> failed with >> AssertionError >> >> [ceph version] >> 0.93-952-gfe28daa >> >> [teuthology, ceph-qa-suite] >> newest version at 3/25/2015 >> >> [configurations] >> check-locks: false >> overrides: >> ceph: >> conf: >> global: >> ms inject socket failures: 5000 >> osd: >> osd heartbeat use min delay socket: true >> osd sloppy crc: true >> fs: xfs >> roles: >> - - mon.a >> - osd.0 >> - osd.4 >> - osd.8 >> - osd.12 >> - - mon.b >> - osd.1 >> - osd.5 >> - osd.9 >> - osd.13 >> - - mon.c >> - osd.2 >> - osd.6 >> - osd.10 >> - osd.14 >> - - osd.3 >> - osd.7 >> - osd.11 >> - osd.15 >> - client.0 >> targets: >> ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: >> ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: >> ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: >> ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: >> tasks: >> - ceph: >> conf: >> osd: >> osd debug reject backfill probability: 0.3 >> osd max backfills: 1 >> osd scrub max interval: 120 >> osd scrub min interval: 60 >> log-whitelist: >> - wrongly marked me down >> - objects unfound and apparently lost >> - thrashosds: >> chance_pgnum_grow: 1 >> chance_pgpnum_fix: 1 >> min_in: 4 >> timeout: 1200 >> - rados: >> clients: >> - client.0 >> ec_pool: true >> erasure_code_profile: >> k: 4 >> l: 3 >> m: 2 >> name: lrcprofile >> plugin: lrc >> ruleset-failure-domain: osd >> objects: 50 >> op_weights: >> append: 100 >> copy_from: 50 >> delete: 50 >> read: 100 >> rmattr: 25 >> rollback: 50 >> setattr: 25 >> snap_create: 50 >> snap_remove: 50 >> write: 0 >> ops: 190000 >> >> Best regards, >> Takeshi Miyamae >> > -- Loïc Dachary, Artisan Logiciel Libre ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f