On Wed, Apr 1, 2015 at 12:31 AM, Barclay Jameson <almightybeeij@xxxxxxxxx> wrote: > Here is the mds output from the command you requested. I did this > during the small data run . ( time cp small1/* small2/ ) > It is 20MB in size so I couldn't find a place online that would accept > that much data. > > Please find attached file. > > Thanks, In the log file, each 'create' request is followed by several 'getattr' requests. I guess these 'getattr' requests resulted from some kinds of permission check, but I can't reproduce this situation locally. which version of ceph/kernel are you using? do you use ceph-fuse or kernel client, what's the mount options? Regards Yan, Zheng > > Beeij > > > On Mon, Mar 30, 2015 at 10:59 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >> On Sun, Mar 29, 2015 at 1:12 AM, Barclay Jameson >> <almightybeeij@xxxxxxxxx> wrote: >>> I redid my entire Ceph build going back to to CentOS 7 hoping to the >>> get the same performance I did last time. >>> The rados bench test was the best I have ever had with a time of 740 >>> MB wr and 1300 MB rd. This was even better than the first rados bench >>> test that had performance equal to PanFS. I find that this does not >>> translate to my CephFS. Even with the following tweaking it still at >>> least twice as slow as PanFS and my first *Magical* build (that had >>> absolutely no tweaking): >>> >>> OSD >>> osd_op_treads 8 >>> /sys/block/sd*/queue/nr_requests 4096 >>> /sys/block/sd*/queue/read_ahead_kb 4096 >>> >>> Client >>> rsize=16777216 >>> readdir_max_bytes=16777216 >>> readdir_max_entries=16777216 >>> >>> ~160 mins to copy 100000 (1MB) files for CephFS vs ~50 mins for PanFS. >>> Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s. >>> >>> Strange thing is none of the resources are taxed. >>> CPU, ram, network, disks, are not even close to being taxed on either >>> the client,mon/mds, or the osd nodes. >>> The PanFS client node was a 10Gb network the same as the CephFS client >>> but you can see the huge difference in speed. >>> >>> As per Gregs questions before: >>> There is only one client reading and writing (time cp Small1/* >>> Small2/.) but three clients have cephfs mounted, although they aren't >>> doing anything on the filesystem. >>> >>> I have done another test where I stream data info a file as fast as >>> the processor can put it there. >>> (for (i=0; i < 1000000001; i++){ fprintf (out_file, "I is : %d\n",i);} >>> ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the >>> above tuning vs 130 seconds for PanFS. Without the tuning it takes 230 >>> seconds for CephFS although the first build did it in 130 seconds >>> without any tuning. >>> >>> This leads me to believe the bottleneck is the mds. Does anybody have >>> any thoughts on this? >>> Are there any tuning parameters that I would need to speed up the mds? >> >> could you enable mds debugging for a few seconds (ceph daemon mds.x >> config set debug_mds 10; sleep 10; ceph daemon mds.x config set >> debug_mds 0). and upload /var/log/ceph/mds.x.log to somewhere. >> >> Regards >> Yan, Zheng >> >>> >>> On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>> On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson >>>> <almightybeeij@xxxxxxxxx> wrote: >>>>> Yes it's the exact same hardware except for the MDS server (although I >>>>> tried using the MDS on the old node). >>>>> I have not tried moving the MON back to the old node. >>>>> >>>>> My default cache size is "mds cache size = 10000000" >>>>> The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks. >>>>> I created 2048 for data and metadata: >>>>> ceph osd pool create cephfs_data 2048 2048 >>>>> ceph osd pool create cephfs_metadata 2048 2048 >>>>> >>>>> >>>>> To your point on clients competing against each other... how would I check that? >>>> >>>> Do you have multiple clients mounted? Are they both accessing files in >>>> the directory(ies) you're testing? Were they accessing the same >>>> pattern of files for the old cluster? >>>> >>>> If you happen to be running a hammer rc or something pretty new you >>>> can use the MDS admin socket to explore a bit what client sessions >>>> there are and what they have permissions on and check; otherwise >>>> you'll have to figure it out from the client side. >>>> -Greg >>>> >>>>> >>>>> Thanks for the input! >>>>> >>>>> >>>>> On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>>>> So this is exactly the same test you ran previously, but now it's on >>>>>> faster hardware and the test is slower? >>>>>> >>>>>> Do you have more data in the test cluster? One obvious possibility is >>>>>> that previously you were working entirely in the MDS' cache, but now >>>>>> you've got more dentries and so it's kicking data out to RADOS and >>>>>> then reading it back in. >>>>>> >>>>>> If you've got the memory (you appear to) you can pump up the "mds >>>>>> cache size" config option quite dramatically from it's default 100000. >>>>>> >>>>>> Other things to check are that you've got an appropriately-sized >>>>>> metadata pool, that you've not got clients competing against each >>>>>> other inappropriately, etc. >>>>>> -Greg >>>>>> >>>>>> On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson >>>>>> <almightybeeij@xxxxxxxxx> wrote: >>>>>>> Opps I should have said that I am not just writing the data but copying it : >>>>>>> >>>>>>> time cp Small1/* Small2/* >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> BJ >>>>>>> >>>>>>> On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson >>>>>>> <almightybeeij@xxxxxxxxx> wrote: >>>>>>>> I did a Ceph cluster install 2 weeks ago where I was getting great >>>>>>>> performance (~= PanFS) where I could write 100,000 1MB files in 61 >>>>>>>> Mins (Took PanFS 59 Mins). I thought I could increase the performance >>>>>>>> by adding a better MDS server so I redid the entire build. >>>>>>>> >>>>>>>> Now it takes 4 times as long to write the same data as it did before. >>>>>>>> The only thing that changed was the MDS server. (I even tried moving >>>>>>>> the MDS back on the old slower node and the performance was the same.) >>>>>>>> >>>>>>>> The first install was on CentOS 7. I tried going down to CentOS 6.6 >>>>>>>> and it's the same results. >>>>>>>> I use the same scripts to install the OSDs (which I created because I >>>>>>>> can never get ceph-deploy to behave correctly. Although, I did use >>>>>>>> ceph-deploy to create the MDS and MON and initial cluster creation.) >>>>>>>> >>>>>>>> I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read >>>>>>>> with rados bench -p cephfs_data 500 write --no-cleanup && rados bench >>>>>>>> -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read) >>>>>>>> >>>>>>>> Could anybody think of a reason as to why I am now getting a huge regression. >>>>>>>> >>>>>>>> Hardware Setup: >>>>>>>> [OSDs] >>>>>>>> 64 GB 2133 MHz >>>>>>>> Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores) >>>>>>>> 40Gb Mellanox NIC >>>>>>>> >>>>>>>> [MDS/MON new] >>>>>>>> 128 GB 2133 MHz >>>>>>>> Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores) >>>>>>>> 40Gb Mellanox NIC >>>>>>>> >>>>>>>> [MDS/MON old] >>>>>>>> 32 GB 800 MHz >>>>>>>> Dual Proc E5472 @ 3.00GHz (8 Cores) >>>>>>>> 10Gb Intel NIC >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html