Re: teuthology timeout error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Takeshi,

I'm trying to repeat your problem at https://github.com/ceph/ceph-qa-suite/pull/445. To be continued :-)

Cheers

On 26/05/2015 04:39, Miyamae, Takeshi wrote:
> Hi Loic,
> 
> We rebased our teuthology/ceph-qa-suite and retried the test toward LRC on current master.
> However, we unfortunately got the same result as before (timeout error).
> 
> [test conditions]
> Target : Ceph-9.0.0-971-gd49d816
> https://github.com/kawaguchi-s/teuthology
> https://github.com/kawaguchi-s/ceph-qa-suite/tree/wip-10886-lrc
> 
> [teuthology log]
> 
> 2015-05-25 10:18:23	# start
> 
> 2015-05-25 11:59:52,106.106 INFO:teuthology.orchestra.run.RX35-1:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status -- format=json-pretty'
> 2015-05-25 11:59:52,564.564 INFO:tasks.ceph.ceph_manager:no progress seen, keeping timeout for now
> 2015-05-25 11:59:52,565.565 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 635, in wrapper
>     return func(self)
>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 668, in do_thrash
>     timeout=self.config.get('timeout')
>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1569, in wait_for_recovery
>     'failed to recover before timeout expired'
> AssertionError: failed to recover before timeout expired
> 
> Traceback (most recent call last):
>   File "/root/work/teuthology/virtualenv/lib/python2.7/site-packages/gevent/greenlet.py", line 390, in run
>     result = self._run(*self.args, **self.kwargs)
>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 635, in wrapper
>     return func(self)
>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 668, in do_thrash
>     timeout=self.config.get('timeout')
>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1569, in wait_for_recovery
>     'failed to recover before timeout expired'
> AssertionError: failed to recover before timeout expired <Greenlet at 0x36cacd0: <bound method Thrasher.do_thrash of <tasks.ceph_manager.Thrasher instance at 0x36df3f8>>> failed with AssertionError
> 
> Best regards,
> Takeshi Miyamae
> 
> -----Original Message-----
> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] 
> Sent: Thursday, May 21, 2015 6:38 PM
> To: Miyamae, Takeshi/宮前 剛; Ceph Development
> Cc: Kawaguchi, Shotaro/川口 翔太朗; Imai, Hiroki/今井 宏樹; Nakao, Takanori/中尾 鷹詔; Shiozawa, Kensuke/塩沢 賢輔
> Subject: Re: teuthology timeout error
> 
> Hi,
> 
> [sorry the previous mail was sent by accident, here is the full mail]
> 
> On 21/05/2015 10:32, Miyamae, Takeshi wrote:
>> Hi Loic,
>>
>>> Could you please share the teuthology/ceph-qa-suite repository you 
>>> are using to run these tests so I can try to reproduce / diagnose the problem ?
>>
>> https://github.com/kawaguchi-s/teuthology/tree/wip-10886
>> https://github.com/kawaguchi-s/ceph-qa-suite/tree/wip-10886
>>
> 
> When compared against master they show differences that indicate it would be good to rebase:
> 
> https://github.com/ceph/teuthology/compare/master...kawaguchi-s:wip-10886
> https://github.com/ceph/ceph-qa-suite/compare/master...kawaguchi-s:wip-10886
> 
> I think the teuthology commit on top of wip-10886 is a mistake
> 
> https://github.com/kawaguchi-s/teuthology/commit/348e54931f89c9b0ae7a84eb931576f8414017b5
> 
> do you really need to modify teuthology ? It should just be necessary to use the latest master branch.
> 
> It looks like the
> 
> https://github.com/kawaguchi-s/ceph-qa-suite/commit/f2e3ca5d12ceef742eae2a9cf4057c436e9040c3
> 
> commit in your ceph-qa-suite is not what you intended. However
> 
> https://github.com/kawaguchi-s/ceph-qa-suite/commit/4b39d6d4862f9091a849d224e880795be406815d
> https://github.com/kawaguchi-s/ceph-qa-suite/commit/d16b4b058ae118931928541a2c8acd68f9703a44
> 
> look ok :-) Instead of naming the test 4nodes16osds3mons1client.yaml it would be better to use the same kind of naming you see at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados/thrash-erasure-code/workloads. That is a file name made of the distinctive parameters for the shec plugin (the parameters that are the default can be omited).
> 
> Cheers
> 
>> Here are our teuthology/ceph-qa-suite repositories. Thanks in advance.
>>
>> Best regards,
>> Takeshi Miyamae
>>
>> -----Original Message-----
>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx]
>> Sent: Wednesday, May 20, 2015 4:49 PM
>> To: Miyamae, Takeshi/宮前 剛; Ceph Development
>> Cc: Kawaguchi, Shotaro/川口 翔太朗; Imai, Hiroki/今井 宏樹; Nakao, Takanori/中尾 
>> 鷹詔; Shiozawa, Kensuke/塩沢 賢輔
>> Subject: Re: teuthology timeout error
>>
>> Hi,
>>
>> On 20/05/2015 04:20, Miyamae, Takeshi wrote:
>>> Hi Loic,
>>>
>>> When we fixed our own issue and restarted teuthology,
>>
>> Great !
>>
>>> we encountered another issue (timeout error) which occurs in case of LRC as well.
>>> Do you have any information about that ?
>>
>> Could you please share the teuthology/ceph-qa-suite repository you are using to run these tests so I can try to reproduce / diagnose the problem ?
>>
>> Thanks
>>
>>>
>>> [error messages (in case of LRC pool)]
>>>
>>> 2015-04-28 12:38:54,128.128 INFO:teuthology.orchestra.run.RX35-1:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty'
>>> 2015-04-28 12:38:54,516.516 INFO:tasks.ceph.ceph_manager:no progress 
>>> seen, keeping timeout for now
>>> 2015-04-28 12:38:54,516.516 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
>>>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 632, in wrapper
>>>     return func(self)
>>>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 665, in do_thrash
>>>     timeout=self.config.get('timeout')
>>>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1566, in wait_for_recovery
>>>     'failed to recover before timeout expired'
>>> AssertionError: failed to recover before timeout expired
>>>
>>> Traceback (most recent call last):
>>>   File "/root/work/teuthology/virtualenv/lib/python2.7/site-packages/gevent/greenlet.py", line 390, in run
>>>     result = self._run(*self.args, **self.kwargs)
>>>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 632, in wrapper
>>>     return func(self)
>>>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 665, in do_thrash
>>>     timeout=self.config.get('timeout')
>>>   File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1566, in wait_for_recovery
>>>     'failed to recover before timeout expired'
>>> AssertionError: failed to recover before timeout expired <Greenlet at 
>>> 0x2a7d550: <bound method Thrasher.do_thrash of 
>>> <tasks.ceph_manager.Thrasher instance at 0x2bd12d8>>> failed with 
>>> AssertionError
>>>
>>> [ceph version]
>>> 0.93-952-gfe28daa
>>>
>>> [teuthology, ceph-qa-suite]
>>> newest version at 3/25/2015
>>>
>>> [configurations]
>>>   check-locks: false
>>>   overrides:
>>>     ceph:
>>>       conf:
>>>         global:
>>>           ms inject socket failures: 5000
>>>         osd:
>>>           osd heartbeat use min delay socket: true
>>>           osd sloppy crc: true
>>>       fs: xfs
>>>   roles:
>>>   - - mon.a
>>>     - osd.0
>>>     - osd.4
>>>     - osd.8
>>>     - osd.12
>>>   - - mon.b
>>>     - osd.1
>>>     - osd.5
>>>     - osd.9
>>>     - osd.13
>>>   - - mon.c
>>>     - osd.2
>>>     - osd.6
>>>     - osd.10
>>>     - osd.14
>>>   - - osd.3
>>>     - osd.7
>>>     - osd.11
>>>     - osd.15
>>>     - client.0
>>>   targets:
>>>     ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:
>>>     ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:
>>>     ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:
>>>     ubuntu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:
>>>   tasks:
>>>   - ceph:
>>>       conf:
>>>         osd:
>>>           osd debug reject backfill probability: 0.3
>>>           osd max backfills: 1
>>>           osd scrub max interval: 120
>>>           osd scrub min interval: 60
>>>       log-whitelist:
>>>       - wrongly marked me down
>>>       - objects unfound and apparently lost
>>>   - thrashosds:
>>>       chance_pgnum_grow: 1
>>>       chance_pgpnum_fix: 1
>>>       min_in: 4
>>>       timeout: 1200
>>>   - rados:
>>>       clients:
>>>       - client.0
>>>       ec_pool: true
>>>       erasure_code_profile:
>>>         k: 4
>>>         l: 3
>>>         m: 2
>>>         name: lrcprofile
>>>         plugin: lrc
>>>         ruleset-failure-domain: osd
>>>       objects: 50
>>>       op_weights:
>>>         append: 100
>>>         copy_from: 50
>>>         delete: 50
>>>         read: 100
>>>         rmattr: 25
>>>         rollback: 50
>>>         setattr: 25
>>>         snap_create: 50
>>>         snap_remove: 50
>>>         write: 0
>>>       ops: 190000
>>>
>>> Best regards,
>>> Takeshi Miyamae
>>>
>>
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux