On Sun, Mar 29, 2015 at 1:12 AM, Barclay Jameson <almightybeeij@xxxxxxxxx> wrote: > I redid my entire Ceph build going back to to CentOS 7 hoping to the > get the same performance I did last time. > The rados bench test was the best I have ever had with a time of 740 > MB wr and 1300 MB rd. This was even better than the first rados bench > test that had performance equal to PanFS. I find that this does not > translate to my CephFS. Even with the following tweaking it still at > least twice as slow as PanFS and my first *Magical* build (that had > absolutely no tweaking): > > OSD > osd_op_treads 8 > /sys/block/sd*/queue/nr_requests 4096 > /sys/block/sd*/queue/read_ahead_kb 4096 > > Client > rsize=16777216 > readdir_max_bytes=16777216 > readdir_max_entries=16777216 > > ~160 mins to copy 100000 (1MB) files for CephFS vs ~50 mins for PanFS. > Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s. > > Strange thing is none of the resources are taxed. > CPU, ram, network, disks, are not even close to being taxed on either > the client,mon/mds, or the osd nodes. > The PanFS client node was a 10Gb network the same as the CephFS client > but you can see the huge difference in speed. > > As per Gregs questions before: > There is only one client reading and writing (time cp Small1/* > Small2/.) but three clients have cephfs mounted, although they aren't > doing anything on the filesystem. > > I have done another test where I stream data info a file as fast as > the processor can put it there. > (for (i=0; i < 1000000001; i++){ fprintf (out_file, "I is : %d\n",i);} > ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the > above tuning vs 130 seconds for PanFS. Without the tuning it takes 230 > seconds for CephFS although the first build did it in 130 seconds > without any tuning. > > This leads me to believe the bottleneck is the mds. Does anybody have > any thoughts on this? > Are there any tuning parameters that I would need to speed up the mds? could you enable mds debugging for a few seconds (ceph daemon mds.x config set debug_mds 10; sleep 10; ceph daemon mds.x config set debug_mds 0). and upload /var/log/ceph/mds.x.log to somewhere. Regards Yan, Zheng > > On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson >> <almightybeeij@xxxxxxxxx> wrote: >>> Yes it's the exact same hardware except for the MDS server (although I >>> tried using the MDS on the old node). >>> I have not tried moving the MON back to the old node. >>> >>> My default cache size is "mds cache size = 10000000" >>> The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks. >>> I created 2048 for data and metadata: >>> ceph osd pool create cephfs_data 2048 2048 >>> ceph osd pool create cephfs_metadata 2048 2048 >>> >>> >>> To your point on clients competing against each other... how would I check that? >> >> Do you have multiple clients mounted? Are they both accessing files in >> the directory(ies) you're testing? Were they accessing the same >> pattern of files for the old cluster? >> >> If you happen to be running a hammer rc or something pretty new you >> can use the MDS admin socket to explore a bit what client sessions >> there are and what they have permissions on and check; otherwise >> you'll have to figure it out from the client side. >> -Greg >> >>> >>> Thanks for the input! >>> >>> >>> On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>> So this is exactly the same test you ran previously, but now it's on >>>> faster hardware and the test is slower? >>>> >>>> Do you have more data in the test cluster? One obvious possibility is >>>> that previously you were working entirely in the MDS' cache, but now >>>> you've got more dentries and so it's kicking data out to RADOS and >>>> then reading it back in. >>>> >>>> If you've got the memory (you appear to) you can pump up the "mds >>>> cache size" config option quite dramatically from it's default 100000. >>>> >>>> Other things to check are that you've got an appropriately-sized >>>> metadata pool, that you've not got clients competing against each >>>> other inappropriately, etc. >>>> -Greg >>>> >>>> On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson >>>> <almightybeeij@xxxxxxxxx> wrote: >>>>> Opps I should have said that I am not just writing the data but copying it : >>>>> >>>>> time cp Small1/* Small2/* >>>>> >>>>> Thanks, >>>>> >>>>> BJ >>>>> >>>>> On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson >>>>> <almightybeeij@xxxxxxxxx> wrote: >>>>>> I did a Ceph cluster install 2 weeks ago where I was getting great >>>>>> performance (~= PanFS) where I could write 100,000 1MB files in 61 >>>>>> Mins (Took PanFS 59 Mins). I thought I could increase the performance >>>>>> by adding a better MDS server so I redid the entire build. >>>>>> >>>>>> Now it takes 4 times as long to write the same data as it did before. >>>>>> The only thing that changed was the MDS server. (I even tried moving >>>>>> the MDS back on the old slower node and the performance was the same.) >>>>>> >>>>>> The first install was on CentOS 7. I tried going down to CentOS 6.6 >>>>>> and it's the same results. >>>>>> I use the same scripts to install the OSDs (which I created because I >>>>>> can never get ceph-deploy to behave correctly. Although, I did use >>>>>> ceph-deploy to create the MDS and MON and initial cluster creation.) >>>>>> >>>>>> I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read >>>>>> with rados bench -p cephfs_data 500 write --no-cleanup && rados bench >>>>>> -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) >>>>>> >>>>>> Could anybody think of a reason as to why I am now getting a huge regression. >>>>>> >>>>>> Hardware Setup: >>>>>> [OSDs] >>>>>> 64 GB 2133 MHz >>>>>> Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) >>>>>> 40Gb Mellanox NIC >>>>>> >>>>>> [MDS/MON new] >>>>>> 128 GB 2133 MHz >>>>>> Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) >>>>>> 40Gb Mellanox NIC >>>>>> >>>>>> [MDS/MON old] >>>>>> 32 GB 800 MHz >>>>>> Dual Proc E5472 @ 3.00GHz (8 Cores) >>>>>> 10Gb Intel NIC >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html