Yeah, i'm not seeing stuff being moved at all. Perhaps we should file a ticket to request a way to tell an OSD to rebalance its directory structure. On Fri, Sep 4, 2015 at 5:08 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > I've just made the same change ( 4 and 40 for now) on my cluster which is a similar size to yours. I didn't see any merging happening, although most of the directory's I looked at had more files in than the new merge threshold, so I guess this is to be expected > > I'm currently splitting my PG's from 1024 to 2048 to see if that helps to bring things back into order. > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Wang, Warren >> Sent: 04 September 2015 01:21 >> To: Mark Nelson <mnelson@xxxxxxxxxx>; Ben Hines <bhines@xxxxxxxxx> >> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx> >> Subject: Re: Ceph performance, empty vs part full >> >> I'm about to change it on a big cluster too. It totals around 30 million, so I'm a >> bit nervous on changing it. As far as I understood, it would indeed move >> them around, if you can get underneath the threshold, but it may be hard to >> do. Two more settings that I highly recommend changing on a big prod >> cluster. I'm in favor of bumping these two up in the defaults. >> >> Warren >> >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Mark Nelson >> Sent: Thursday, September 03, 2015 6:04 PM >> To: Ben Hines <bhines@xxxxxxxxx> >> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx> >> Subject: Re: Ceph performance, empty vs part full >> >> Hrm, I think it will follow the merge/split rules if it's out of whack given the >> new settings, but I don't know that I've ever tested it on an existing cluster to >> see that it actually happens. I guess let it sit for a while and then check the >> OSD PG directories to see if the object counts make sense given the new >> settings? :D >> >> Mark >> >> On 09/03/2015 04:31 PM, Ben Hines wrote: >> > Hey Mark, >> > >> > I've just tweaked these filestore settings for my cluster -- after >> > changing this, is there a way to make ceph move existing objects >> > around to new filestore locations, or will this only apply to newly >> > created objects? (i would assume the latter..) >> > >> > thanks, >> > >> > -Ben >> > >> > On Wed, Jul 8, 2015 at 6:39 AM, Mark Nelson <mnelson@xxxxxxxxxx> >> wrote: >> >> Basically for each PG, there's a directory tree where only a certain >> >> number of objects are allowed in a given directory before it splits >> >> into new branches/leaves. The problem is that this has a fair amount >> >> of overhead and also there's extra associated dentry lookups to get at any >> given object. >> >> >> >> You may want to try something like: >> >> >> >> "filestore merge threshold = 40" >> >> "filestore split multiple = 8" >> >> >> >> This will dramatically increase the number of objects per directory >> allowed. >> >> >> >> Another thing you may want to try is telling the kernel to greatly >> >> favor retaining dentries and inodes in cache: >> >> >> >> echo 1 | sudo tee /proc/sys/vm/vfs_cache_pressure >> >> >> >> Mark >> >> >> >> >> >> On 07/08/2015 08:13 AM, MATHIAS, Bryn (Bryn) wrote: >> >>> >> >>> If I create a new pool it is generally fast for a short amount of time. >> >>> Not as fast as if I had a blank cluster, but close to. >> >>> >> >>> Bryn >> >>>> >> >>>> On 8 Jul 2015, at 13:55, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >>>> >> >>>> I think you're probably running into the internal PG/collection >> >>>> splitting here; try searching for those terms and seeing what your >> >>>> OSD folder structures look like. You could test by creating a new >> >>>> pool and seeing if it's faster or slower than the one you've already filled >> up. >> >>>> -Greg >> >>>> >> >>>> On Wed, Jul 8, 2015 at 1:25 PM, MATHIAS, Bryn (Bryn) >> >>>> <bryn.mathias@xxxxxxxxxxxxxxxxxx> wrote: >> >>>>> >> >>>>> Hi All, >> >>>>> >> >>>>> >> >>>>> I’m perf testing a cluster again, >> >>>>> This time I have re-built the cluster and am filling it for testing. >> >>>>> >> >>>>> on a 10 min run I get the following results from 5 load >> >>>>> generators, each writing though 7 iocontexts, with a queue depth of >> 50 async writes. >> >>>>> >> >>>>> >> >>>>> Gen1 >> >>>>> Percentile 100 = 0.729775905609 >> >>>>> Max latencies = 0.729775905609, Min = 0.0320818424225, mean = >> >>>>> 0.0750389684542 >> >>>>> Total objects writen = 113088 in time 604.259738207s gives >> >>>>> 187.151307376/s (748.605229503 MB/s) >> >>>>> >> >>>>> Gen2 >> >>>>> Percentile 100 = 0.735981941223 >> >>>>> Max latencies = 0.735981941223, Min = 0.0340068340302, mean = >> >>>>> 0.0745198070711 >> >>>>> Total objects writen = 113822 in time 604.437897921s gives >> >>>>> 188.310495407/s (753.241981627 MB/s) >> >>>>> >> >>>>> Gen3 >> >>>>> Percentile 100 = 0.828994989395 >> >>>>> Max latencies = 0.828994989395, Min = 0.0349340438843, mean = >> >>>>> 0.0745455575197 >> >>>>> Total objects writen = 113670 in time 604.352181911s gives >> >>>>> 188.085694736/s (752.342778944 MB/s) >> >>>>> >> >>>>> Gen4 >> >>>>> Percentile 100 = 1.06834602356 >> >>>>> Max latencies = 1.06834602356, Min = 0.0333499908447, mean = >> >>>>> 0.0752239764659 >> >>>>> Total objects writen = 112744 in time 604.408732891s gives >> >>>>> 186.536020849/s (746.144083397 MB/s) >> >>>>> >> >>>>> Gen5 >> >>>>> Percentile 100 = 0.609658002853 >> >>>>> Max latencies = 0.609658002853, Min = 0.032968044281, mean = >> >>>>> 0.0744482759499 >> >>>>> Total objects writen = 113918 in time 604.671534061s gives >> >>>>> 188.396498897/s (753.585995589 MB/s) >> >>>>> >> >>>>> example ceph -w output: >> >>>>> 2015-07-07 15:50:16.507084 mon.0 [INF] pgmap v1077: 2880 pgs: 2880 >> >>>>> active+clean; 1996 GB data, 2515 GB used, 346 TB / 348 TB avail; >> >>>>> active+2185 MB/s >> >>>>> wr, 572 op/s >> >>>>> >> >>>>> >> >>>>> However when the cluster gets over 20% full I see the following >> >>>>> results, this gets worse as the cluster fills up: >> >>>>> >> >>>>> Gen1 >> >>>>> Percentile 100 = 6.71176099777 >> >>>>> Max latencies = 6.71176099777, Min = 0.0358741283417, mean = >> >>>>> 0.161760483485 >> >>>>> Total objects writen = 52196 in time 604.488474131s gives >> >>>>> 86.347386648/s >> >>>>> (345.389546592 MB/s) >> >>>>> >> >>>>> Gen2 >> >>>>> Max latencies = 4.09169006348, Min = 0.0357890129089, mean = >> >>>>> 0.163243938477 >> >>>>> Total objects writen = 51702 in time 604.036739111s gives >> >>>>> 85.5941313704/s (342.376525482 MB/s) >> >>>>> >> >>>>> Gen3 >> >>>>> Percentile 100 = 7.32526683807 >> >>>>> Max latencies = 7.32526683807, Min = 0.0366668701172, mean = >> >>>>> 0.163992217926 >> >>>>> Total objects writen = 51476 in time 604.684302092s gives >> >>>>> 85.1287189397/s (340.514875759 MB/s) >> >>>>> >> >>>>> Gen4 >> >>>>> Percentile 100 = 7.56094503403 >> >>>>> Max latencies = 7.56094503403, Min = 0.0355761051178, mean = >> >>>>> 0.162109421231 >> >>>>> Total objects writen = 52092 in time 604.769910812s gives >> >>>>> 86.1352376642/s (344.540950657 MB/s) >> >>>>> >> >>>>> >> >>>>> Gen5 >> >>>>> Percentile 100 = 6.99595499039 >> >>>>> Max latencies = 6.99595499039, Min = 0.0364680290222, mean = >> >>>>> 0.163651215426 >> >>>>> Total objects writen = 51566 in time 604.061977148s gives >> >>>>> 85.3654127404/s (341.461650961 MB/s) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> Cluster details: >> >>>>> 5*HPDL380’s with 13*6Tb OSD’s >> >>>>> 128Gb Ram >> >>>>> 2*intel 2620v3 >> >>>>> 10 Gbit Ceph public network >> >>>>> 10 Gbit Ceph private network >> >>>>> >> >>>>> Load generators connected via a 20Gbit bond to the ceph public >> network. >> >>>>> >> >>>>> >> >>>>> Is this likely to be something happening to the journals? >> >>>>> >> >>>>> Or is there something else going on. >> >>>>> >> >>>>> I have run FIO and iperf tests and the disk and network >> >>>>> performance is very high. >> >>>>> >> >>>>> >> >>>>> Kind Regards, >> >>>>> Bryn Mathias >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> ceph-users mailing list >> >>>>> ceph-users@xxxxxxxxxxxxxx >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >>> >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> ceph-users@xxxxxxxxxxxxxx >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com