Re: CephFS and slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 22, 2014 at 12:04 AM, Dan van der Ster
<daniel.vanderster@xxxxxxx> wrote:
> Hi Greg,
> Yes, this still happens after the updatedb fix.
>
> [root@xxx dan]# mount
> ...
> zzz:6789:/ on /mnt/ceph type ceph (name=cephfs,key=client.cephfs)
>
> [root@xxx dan]# pwd
> /mnt/ceph/dan
>
> [root@xxx dan]# dd if=/dev/zero of=yyy bs=4M count=2000
> 2000+0 records in
> 2000+0 records out
> 8388608000 bytes (8.4 GB) copied, 9.21217 s, 911 MB/s
>
>
> Then 30s later:
>
> 2014-02-21 16:16:11.315110 osd.326 x:6836/31929 683 : [WRN] 1 slow requests,
> 1 included below; oldest blocked for > 32.432401 secs
> 2014-02-21 16:16:11.315317 osd.326 x:6836/31929 684 : [WRN] slow request
> 32.432401 seconds old, received at 2014-02-21 16:15:38.882584:
> osd_op(client.16735018.1:22522476 100000352bf.000002a4 [write 0~4194304
> [8@0],startsync 0~0] 0.5447d769 snapc 1=[] e42655) v4 currently waiting for
> subops from [357,191]
>
> And no slow requests for other active clients.
>
> Reminder, this is 1GigE client, 64GB RAM, kernel 3.13.0-1.el6.elrepo.x86_64,
> kernel mounted cephfs. I can't reproduce this on a 1GigE client with only
> 8GB ram, 3.11.0-15-generic and 3.13.4-031304-generic. (The smaller RAM
> client is writing at 110-120MB/s vs the 900MB/s writes seen on the big RAM
> machine -- obviously the writes are all buffered on the big ram machine).
> Maybe the RAM isn't related, though, as with fdatasync mode we still see the
> slow requests:
>
> [root@xxx dan]# dd if=/dev/zero of=yyy bs=4M count=2000 conv=fdatasync
> 2000+0 records in
> 2000+0 records out
> 8388608000 bytes (8.4 GB) copied, 78.26 s, 107 MB/s

It's likely this issue is related to big RAM. Big RAM allow the kernel
to cache large amount of dirty data. Therefore the kernel creates lots
of OSD requests when flushing dirty data. (conv=fdatasync doesn't help
here because dd calls fdatasync after all buffered writes finish)

Regards
Yan, Zheng

>
> 2014-02-21 16:26:15.202047 osd.818 x:6803/128164 1219 : [WRN] 1 slow
> requests, 1 included below; oldest blocked for > 30.446683 secs
> 2014-02-21 16:26:15.202194 osd.818 x:6803/128164 1220 : [WRN] slow request
> 30.446683 seconds old, received at 2014-02-21 16:25:44.754914:
> osd_op(client.16735018.1:22524842 100000352bf.00000355 [write 0~4194304
> [12@0],startsync 0~0] 0.c36d4557 snapc 1=[] e42655) v4 currently waiting for
> subops from [558,827]
>
>
> Cheers, Dan
>
>
>
> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>
>
> On Thu, Feb 20, 2014 at 4:02 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> Arne,
>> Sorry this got dropped -- I had it marked in my mail but didn't have
>> the chance to think about it seriously when you sent it. Does this
>> still happen after the updatedb config change you guys made recently?
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Fri, Jan 31, 2014 at 5:52 AM, Arne Wiebalck <Arne.Wiebalck@xxxxxxx>
>> wrote:
>> > Hi,
>> >
>> > We observe that we can easily create slow requests with a simple dd on
>> > CephFS:
>> >
>> > -->
>> > [root@p05153026953834 dd]# dd if=/dev/zero of=xxx bs=4M count=1000
>> > 1000+0 records in
>> > 1000+0 records out
>> > 4194304000 bytes (4.2 GB) copied, 4.27824 s, 980 MB/s
>> >
>> > ceph -w:
>> > 2014-01-31 14:28:44.009543 osd.450 [WRN] 1 slow requests, 1 included
>> > below;
>> > oldest blocked for > 31.088950 secs
>> > 2014-01-31 14:28:44.009676 osd.450 [WRN] slow request 31.088950 seconds
>> > old,
>> > received at 2014-01-31 14:28:12.920423:
>> > osd_op(client.16735018.1:22493091
>> > 100000352b3.000002e9 [write 0~4194304,startsync 0~0] 0.518f2eef snapc
>> > 1=[]
>> > e32400) v4 currently waiting for subops from [87,1190]
>> > <---
>> >
>> > From what we see, the OSDs are not busy, so we suspect that it is the
>> > client
>> > starting all requests,
>> > but then the requests take longer than 30 secs to finish writing, i.e.
>> > flushing the client-side buffers.
>> >
>> > Is our understanding correct?
>> > Do these slow requests have an impact on requests from other clients,
>> > i.e.
>> > some OSD resources
>> > consumed by these clients?
>> >
>> > The setup is:
>> > Client: kernel 3.13.0, 1GbE
>> > MDS Emperor 0.72.2
>> > OSDs Dumpling 0.67.5
>> >
>> > Thanks!
>> >  Dan & Arne
>> >
>> >
>> > --
>> > Arne Wiebalck
>> > CERN IT
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux