Re: slow requests from rados bench with small writes

Sage Weil <sage@xxxxxxxxxxx> · Sun, 16 Feb 2014 08:18:51 -0800 (PST)

Good catch!

It sounds like what is needed here is for the deb and rpm packages to add 
/var/lib/ceph to the PRUNEPATHS in /etc/updatedb.conf.  Unfortunately 
there isn't a /etc/updatedb.conf.d type file, so that promises to be 
annoying.

Has anyone done this before?

sage

On Sun, 16 Feb 2014, Dan van der Ster wrote:

> After some further digging I realized that updatedb was running over
> the pgs, indexing all the objects. (According to iostat, updatedb was
> keeping the indexed disk 100% busy!) Oops!
> Since the disks are using the deadline elevator (which by default
> prioritizes reads over writes, and gives writes a deadline of 5
> seconds!), it is perhaps conceivable (yet still surprising) that the
> queues on a few disks were so full of reads that the writes were
> starved for many 10s of seconds.
> 
> I've killed updatedb everywhere and now the rados bench below isn't
> triggering slow requests.
> So now I'm planning to tune deadline so it doesn't prioritize reads so
> much, namely by decreasing write_expire to equal read_expire at 500ms,
> and setting writes_starved to 1. Initial tests are showing that this
> further decreases latency a bit -- but my hope is that this will
> eliminate the possibility of a very long tail of writes. I hope that
> someone will chip in if they've already been down this path and has
> advice/warnings.
> 
> Cheers,
> dan
> 
> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
> 
> On Sat, Feb 15, 2014 at 11:48 PM, Dan van der Ster
> <daniel.vanderster@xxxxxxx> wrote:
> > Dear Ceph experts,
> >
> > We've found that a single client running rados bench can drive other
> > users, ex. RBD users, into slow requests.
> >
> > Starting with a cluster that is not particularly busy, e.g. :
> >
> > 2014-02-15 23:14:33.714085 mon.0 xx:6789/0 725224 : [INF] pgmap
> > v6561996: 27952 pgs: 27952 active+clean; 66303 GB data, 224 TB used,
> > 2850 TB / 3075 TB avail; 4880KB
> > /s rd, 28632KB/s wr, 271op/s
> >
> > We then start a rados bench writing many small objects:
> > rados bench -p test 60 write -t 500 -b 1024 --no-cleanup
> >
> > which gives these results (note the >60s max latency!!):
> >
> > Total time run: 86.351424
> > Total writes made: 91425
> > Write size: 1024
> > Bandwidth (MB/sec): 1.034
> > Stddev Bandwidth: 1.26486
> > Max bandwidth (MB/sec): 7.14941
> > Min bandwidth (MB/sec): 0
> > Average Latency: 0.464847
> > Stddev Latency: 3.04961
> > Max latency: 66.4363
> > Min latency: 0.003188
> >
> > 30 seconds into this bench we start seeing slow requests, not only
> > from bench writes but also some poor RBD clients, e.g.:
> >
> > 2014-02-15 23:16:02.820507 osd.483 xx:6804/46799 2201 : [WRN] slow
> > request 30.195634 seconds old, received at 2014-02-15 23:15:32.624641:
> > osd_sub_op(client.18535427.0:3922272 4.d42
> > 4eb00d42/rbd_data.11371325138b774.0000000000006577/head//4 [] v
> > 42083'71453 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent
> >
> > During a longer, many-hour instance of this small write test, some of
> > these RBD slow writes became very user visible, with disk flushes
> > being blocked long enough (>120s) for the VM kernels to start
> > complaining.
> >
> > A rados bench from a 10Gig-e client writing 4MB objects doesn't have
> > the same long tail of latency, namely:
> >
> > # rados bench -p test 60 write -t 500 --no-cleanup
> > ...
> > Total time run: 62.811466
> > Total writes made: 8553
> > Write size: 4194304
> > Bandwidth (MB/sec): 544.678
> >
> > Stddev Bandwidth: 173.163
> > Max bandwidth (MB/sec): 1000
> > Min bandwidth (MB/sec): 0
> > Average Latency: 3.50719
> > Stddev Latency: 0.309876
> > Max latency: 8.04493
> > Min latency: 0.166138
> >
> > and there are zero slow requests, at least during this 60s duration.
> >
> > While the vast majority of small writes are completing with a
> > reasonable sub-second latency, what is causing the very long tail seen
> > by a few writes?? -- 60-120s!! Can someone advise us where to look in
> > the perf dump, etc... to find which resource/queue is being exhausted
> > during these tests?
> >
> > Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers.
> >
> > Best Regards, Thanks in advance!
> > Dan
> >
> > -- Dan van der Ster || Data & Storage Services || CERN IT Department --
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com