Re: Ceph Crach at sync_thread_timeout after heavy random writes.

Wolfgang Hennerbichler <wolfgang.hennerbichler@xxxxxxxxxxxxxxxx> · Mon, 25 Mar 2013 10:30:06 +0100

Hi Xiaoxi,

sorry, I thought you were testing within VMs and caching turned on (I
assumed, you didn't tell us if you really did use your benchmark within
vms and if not, how you tested rbd outside of VMs).
It just triggered an alarm in me because we had also experienced issues
with benchmarking within a VM (it didn't crash but responded extremely
slow).

Wolfgang

On 03/25/2013 10:15 AM, Chen, Xiaoxi wrote:
> 
> 
> Hi Wolfgang，
> 
>         Thanks for the reply，but why my problem is related with issue#3737？ I cannot find any direct link between them. I didnt turn on qemu cache and my qumu/VM work fine
> 
> 
>                 Xiaoxi
> 
> 在 2013-3-25，17:07，"Wolfgang Hennerbichler" <wolfgang.hennerbichler@xxxxxxxxxxxxxxxx> 写道：
> 
>> Hi,
>>
>> this could be related to this issue here and has been reported multiple
>> times:
>>
>> http://tracker.ceph.com/issues/3737
>>
>> In short: They're working on it, they know about it.
>>
>> Wolfgang
>>
>> On 03/25/2013 10:01 AM, Chen, Xiaoxi wrote:
>>> Hi list,
>>>
>>>         We have hit and reproduce this issue for several times, ceph
>>> will suicide because FileStore: sync_entry timed out after a very heavy
>>> random IO on top of the RBD.
>>>
>>>         My test environment is:
>>>
>>>                            4 Nodes ceph cluster with 20 HDDs for OSDs
>>> and 4 Intel DCS3700 ssds for journal per node, that is 80 spindles in total
>>>
>>>                            48 VMs spread across 12 Physical nodes, 48
>>> RBD attached to the VMs 1:1 via Qemu.
>>>
>>>                            Ceph @ 0.58
>>>
>>>                            XFS were used.
>>>
>>>         I am using Aiostress (something like FIO) to produce random
>>> write requests on top of each RBDs.
>>>
>>>
>>>
>>>         From Ceph-w , ceph reports a very high Ops (10000+ /s) , but
>>> technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K
>>> random write.
>>>
>>>         When digging into the code, I found that the OSD write data to
>>> Pagecache than returned, although it called ::sync_file_range, but this
>>> syscall doesn’t actually sync data to disk when it return,it’s an aync
>>> call. So the situation is , the random write will be extremely fast
>>> since it only write to journal and pagecache, but once syncing , it will
>>> take very long time. The speed gap between journal and OSDs exist, the
>>> amount of data that need to be sync keep increasing, and it will
>>> certainly exceed 600s.
>>>
>>>
>>>
>>>         For more information, I have tried to reproduce this by rados
>>> bench,but failed.
>>>
>>>
>>>
>>>         Could you please let me know if you need any more informations
>>> & have some solutions? Thanks
>>>
>>>
>>>         Xiaoxi
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> -- 
>> DI (FH) Wolfgang Hennerbichler
>> Software Development
>> Unit Advanced Computing Technologies
>> RISC Software GmbH
>> A company of the Johannes Kepler University Linz
>>
>> IT-Center
>> Softwarepark 35
>> 4232 Hagenberg
>> Austria
>>
>> Phone: +43 7236 3343 245
>> Fax: +43 7236 3343 250
>> wolfgang.hennerbichler@xxxxxxxxxxxxxxxx
>> http://www.risc-software.at
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbichler@xxxxxxxxxxxxxxxx
http://www.risc-software.at
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com