Good catch! It sounds like what is needed here is for the deb and rpm packages to add /var/lib/ceph to the PRUNEPATHS in /etc/updatedb.conf. Unfortunately there isn't a /etc/updatedb.conf.d type file, so that promises to be annoying. Has anyone done this before? sage On Sun, 16 Feb 2014, Dan van der Ster wrote: > After some further digging I realized that updatedb was running over > the pgs, indexing all the objects. (According to iostat, updatedb was > keeping the indexed disk 100% busy!) Oops! > Since the disks are using the deadline elevator (which by default > prioritizes reads over writes, and gives writes a deadline of 5 > seconds!), it is perhaps conceivable (yet still surprising) that the > queues on a few disks were so full of reads that the writes were > starved for many 10s of seconds. > > I've killed updatedb everywhere and now the rados bench below isn't > triggering slow requests. > So now I'm planning to tune deadline so it doesn't prioritize reads so > much, namely by decreasing write_expire to equal read_expire at 500ms, > and setting writes_starved to 1. Initial tests are showing that this > further decreases latency a bit -- but my hope is that this will > eliminate the possibility of a very long tail of writes. I hope that > someone will chip in if they've already been down this path and has > advice/warnings. > > Cheers, > dan > > -- Dan van der Ster || Data & Storage Services || CERN IT Department -- > > On Sat, Feb 15, 2014 at 11:48 PM, Dan van der Ster > <daniel.vanderster@xxxxxxx> wrote: > > Dear Ceph experts, > > > > We've found that a single client running rados bench can drive other > > users, ex. RBD users, into slow requests. > > > > Starting with a cluster that is not particularly busy, e.g. : > > > > 2014-02-15 23:14:33.714085 mon.0 xx:6789/0 725224 : [INF] pgmap > > v6561996: 27952 pgs: 27952 active+clean; 66303 GB data, 224 TB used, > > 2850 TB / 3075 TB avail; 4880KB > > /s rd, 28632KB/s wr, 271op/s > > > > We then start a rados bench writing many small objects: > > rados bench -p test 60 write -t 500 -b 1024 --no-cleanup > > > > which gives these results (note the >60s max latency!!): > > > > Total time run: 86.351424 > > Total writes made: 91425 > > Write size: 1024 > > Bandwidth (MB/sec): 1.034 > > Stddev Bandwidth: 1.26486 > > Max bandwidth (MB/sec): 7.14941 > > Min bandwidth (MB/sec): 0 > > Average Latency: 0.464847 > > Stddev Latency: 3.04961 > > Max latency: 66.4363 > > Min latency: 0.003188 > > > > 30 seconds into this bench we start seeing slow requests, not only > > from bench writes but also some poor RBD clients, e.g.: > > > > 2014-02-15 23:16:02.820507 osd.483 xx:6804/46799 2201 : [WRN] slow > > request 30.195634 seconds old, received at 2014-02-15 23:15:32.624641: > > osd_sub_op(client.18535427.0:3922272 4.d42 > > 4eb00d42/rbd_data.11371325138b774.0000000000006577/head//4 [] v > > 42083'71453 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent > > > > During a longer, many-hour instance of this small write test, some of > > these RBD slow writes became very user visible, with disk flushes > > being blocked long enough (>120s) for the VM kernels to start > > complaining. > > > > A rados bench from a 10Gig-e client writing 4MB objects doesn't have > > the same long tail of latency, namely: > > > > # rados bench -p test 60 write -t 500 --no-cleanup > > ... > > Total time run: 62.811466 > > Total writes made: 8553 > > Write size: 4194304 > > Bandwidth (MB/sec): 544.678 > > > > Stddev Bandwidth: 173.163 > > Max bandwidth (MB/sec): 1000 > > Min bandwidth (MB/sec): 0 > > Average Latency: 3.50719 > > Stddev Latency: 0.309876 > > Max latency: 8.04493 > > Min latency: 0.166138 > > > > and there are zero slow requests, at least during this 60s duration. > > > > While the vast majority of small writes are completing with a > > reasonable sub-second latency, what is causing the very long tail seen > > by a few writes?? -- 60-120s!! Can someone advise us where to look in > > the perf dump, etc... to find which resource/queue is being exhausted > > during these tests? > > > > Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers. > > > > Best Regards, Thanks in advance! > > Dan > > > > -- Dan van der Ster || Data & Storage Services || CERN IT Department -- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com