Re: Ceph Crach at sync_thread_timeout after heavy random writes.

"Chen, Xiaoxi" <xiaoxi.chen@xxxxxxxxx> · Mon, 25 Mar 2013 09:15:21 +0000



Hi Wolfgang，

        Thanks for the reply，but why my problem is related with issue#3737？ I cannot find any direct link between them. I didnt turn on qemu cache and my qumu/VM work fine


                Xiaoxi

在 2013-3-25，17:07，"Wolfgang Hennerbichler" <wolfgang.hennerbichler@xxxxxxxxxxxxxxxx> 写道：

> Hi,
> 
> this could be related to this issue here and has been reported multiple
> times:
> 
> http://tracker.ceph.com/issues/3737
> 
> In short: They're working on it, they know about it.
> 
> Wolfgang
> 
> On 03/25/2013 10:01 AM, Chen, Xiaoxi wrote:
>> Hi list,
>> 
>>         We have hit and reproduce this issue for several times, ceph
>> will suicide because FileStore: sync_entry timed out after a very heavy
>> random IO on top of the RBD.
>> 
>>         My test environment is:
>> 
>>                            4 Nodes ceph cluster with 20 HDDs for OSDs
>> and 4 Intel DCS3700 ssds for journal per node, that is 80 spindles in total
>> 
>>                            48 VMs spread across 12 Physical nodes, 48
>> RBD attached to the VMs 1:1 via Qemu.
>> 
>>                            Ceph @ 0.58
>> 
>>                            XFS were used.
>> 
>>         I am using Aiostress (something like FIO) to produce random
>> write requests on top of each RBDs.
>> 
>> 
>> 
>>         From Ceph-w , ceph reports a very high Ops (10000+ /s) , but
>> technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K
>> random write.
>> 
>>         When digging into the code, I found that the OSD write data to
>> Pagecache than returned, although it called ::sync_file_range, but this
>> syscall doesn’t actually sync data to disk when it return,it’s an aync
>> call. So the situation is , the random write will be extremely fast
>> since it only write to journal and pagecache, but once syncing , it will
>> take very long time. The speed gap between journal and OSDs exist, the
>> amount of data that need to be sync keep increasing, and it will
>> certainly exceed 600s.
>> 
>> 
>> 
>>         For more information, I have tried to reproduce this by rados
>> bench,but failed.
>> 
>> 
>> 
>>         Could you please let me know if you need any more informations
>> & have some solutions? Thanks
>> 
>> 
>>         Xiaoxi
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> DI (FH) Wolfgang Hennerbichler
> Software Development
> Unit Advanced Computing Technologies
> RISC Software GmbH
> A company of the Johannes Kepler University Linz
> 
> IT-Center
> Softwarepark 35
> 4232 Hagenberg
> Austria
> 
> Phone: +43 7236 3343 245
> Fax: +43 7236 3343 250
> wolfgang.hennerbichler@xxxxxxxxxxxxxxxx
> http://www.risc-software.at
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com