Ryan, 10(storage) nodes, I did some test w 1 brick per node, and another round w/ 4 per node. Each is FDR connected, but all on the same switch. I'd love to hear about your setup, gluster version, OFED stack etc.... -- Matthew Nicholson Research Computing Specialist Harvard FAS Research Computing matthew_nicholson at harvard.edu On Wed, Jul 10, 2013 at 4:33 PM, Ryan Aydelott <ryade at mcs.anl.gov> wrote: > How many nodes make up that volume that you were using for testing? > > Over 100 nodes running at QDR/IPoIB using 100 threads we we ran around > 60GB/s read and somewhere in the 40GB/s for writes (iirc). > > On Jul 10, 2013, at 1:49 PM, Matthew Nicholson < > matthew_nicholson at harvard.edu> wrote: > > Well, first of all,thank for the responses. The volume WAS failing over > the tcp just as predicted,though WHY is unclear as the fabric is know > working (has about 28K compute cores on it all doing heavy MPI testing on > it), and the OFED/verbs stack is consistent across all client/storage > systems (actually, the OS image is identical). > > Thats quiet sad RDMA isn't going to make 3.4. We put a good deal of hopes > and effort around planning for 3.4 for this storage systems, specifically > for RDMA support (well, with warnings to the team that it wasn't in/test > for 3.3 and that all we could do was HOPE it was in 3.4 and in time for > when we want to go live). we're getting "okay" performance out of IPoIB > right now, and our bottle neck actually seems to be the fabric > design/layout, as we're peaking at about 4.2GB/s writing 10TB over 160 > threads to this distributed volume. > > When it IS ready and in 3.4.1 (hopefully!), having good docs around it, > and maybe even a simple printf for the tcp failover would be huge for us. > > > > -- > Matthew Nicholson > Research Computing Specialist > Harvard FAS Research Computing > matthew_nicholson at harvard.edu > > > > On Wed, Jul 10, 2013 at 3:18 AM, Justin Clift <jclift at redhat.com> wrote: > >> Hi guys, >> >> As an FYI, from discussion on gluster-devel IRC yesterday, the RDMA code >> still isn't in a good enough state for production usage with 3.4.0. :( >> >> There are still outstanding bugs with it, and I'm working to make the >> Gluster Test Framework able to work with RDMA so we can help shake out >> more of them: >> >> >> http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework >> >> Hopefully RDMA will be ready for 3.4.1, but don't hold me to that at >> this stage. :) >> >> Regards and best wishes, >> >> Justin Clift >> >> >> On 09/07/2013, at 8:36 PM, Ryan Aydelott wrote: >> > Matthew, >> > >> > Personally - I have experienced this same problem (even with the mount >> being something.rdma). Running 3.4beta4, if I mounted a volume via RDMA >> that also had TCP configured as a transport option (which obviously you do >> based on the mounts you gave below), if there is ANY issue with RDMA not >> working the mount will silently fall back to TCP. This problem is described >> here: https://bugzilla.redhat.com/show_bug.cgi?id=982757 >> > >> > The way to test for this behavior is create a new volume specifying >> ONLY RDMA as the transport. If you mount this and your RDMA is broken for >> whatever reason - it will simply fail to mount. >> > >> > Assuming this test fails, I would then tail the logs for the volume to >> get a hint of what's going on. In my case there was an RDMA_CM kernel >> module that was not loaded which started to matter as of 3.4beta2 IIRC as >> they did a complete rewrite for this based on poor performance in prior >> releases. The clue in my volume log file was "no such file or directory" >> preceded with an rdma_cm. >> > >> > Hope that helps! >> > >> > >> > -ryan >> > >> > >> > On Jul 9, 2013, at 2:03 PM, Matthew Nicholson < >> matthew_nicholson at harvard.edu> wrote: >> > >> >> Hey guys, >> >> >> >> So, we're testing Gluster RDMA storage, and are having some issues. >> Things are working...just not as we expected them. THere isn't a whole lot >> in the way, that I've foudn on docs for gluster rdma, aside from basically >> "install gluster-rdma", create a volume with transport=rdma, and mount w/ >> transport=rdma.... >> >> >> >> I've done that...and the IB fabric is known to be good...however, a >> volume created with transport=rdma,tcp and mounted w/ transport=rdma, still >> seems to go over tcp? >> >> >> >> A little more info about the setup: >> >> >> >> we've got 10 storage nodes/bricks, each of which has a single 1GB NIC >> and a FRD IB port. Likewise for the test clients. Now, the 1GB nic is for >> management only, and we have all of the systems on this fabric configured >> with IPoIB, so there is eth0, and ib0 on each node. >> >> >> >> All storage nodes are peer'd using the ib0 interface, ie: >> >> >> >> gluster peer probe storage_node01-ib >> >> etc >> >> >> >> thats all well and good. >> >> >> >> Volume was created: >> >> >> >> gluster volume create holyscratch transport rdma,tcp >> holyscratch01-ib:/holyscratch01/brick >> >> for i in `seq -w 2 10` ; do gluster volume add-brick holyscratch >> holyscratch${i}-ib:/holyscratch${i}/brick; done >> >> >> >> yielding: >> >> >> >> Volume Name: holyscratch >> >> Type: Distribute >> >> Volume ID: 788e74dc-6ae2-4aa5-8252-2f30262f0141 >> >> Status: Started >> >> Number of Bricks: 10 >> >> Transport-type: tcp,rdma >> >> Bricks: >> >> Brick1: holyscratch01-ib:/holyscratch01/brick >> >> Brick2: holyscratch02-ib:/holyscratch02/brick >> >> Brick3: holyscratch03-ib:/holyscratch03/brick >> >> Brick4: holyscratch04-ib:/holyscratch04/brick >> >> Brick5: holyscratch05-ib:/holyscratch05/brick >> >> Brick6: holyscratch06-ib:/holyscratch06/brick >> >> Brick7: holyscratch07-ib:/holyscratch07/brick >> >> Brick8: holyscratch08-ib:/holyscratch08/brick >> >> Brick9: holyscratch09-ib:/holyscratch09/brick >> >> Brick10: holyscratch10-ib:/holyscratch10/brick >> >> Options Reconfigured: >> >> nfs.disable: on >> >> >> >> >> >> For testing, we wanted to see how rdma stacked up vs tcp using IPoIB, >> so we mounted this like: >> >> >> >> [root at holy2a01202 holyscratch.tcp]# df -h |grep holyscratch >> >> holyscratch:/holyscratch >> >> 273T 4.1T 269T 2% /n/holyscratch.tcp >> >> holyscratch:/holyscratch.rdma >> >> 273T 4.1T 269T 2% /n/holyscratch.rdma >> >> >> >> so, 2 mounts, same volume different transports. fstab looks like: >> >> >> >> holyscratch:/holyscratch /n/holyscratch.tcp glusterfs >> transport=tcp,fetch-attempts=10,gid-timeout=2,acl,_netdev 0 0 >> >> holyscratch:/holyscratch /n/holyscratch.rdma glusterfs >> transport=rdma,fetch-attempts=10,gid-timeout=2,acl,_netdev 0 0 >> >> >> >> where holyscratch is a RRDNS entry for all the IPoIB interfaces for >> fetching the volfile (something it seems, just like peering, MUST be tcp? ) >> >> >> >> but, again, when running just dumb,dumb,dumb tests (160 threads of dd >> over 8 nodes w/ each thread writing 64GB, so a 10TB throughput test), I'm >> seeing all the traffic on the IPoIB interface for both RDMA and TCP >> transports...when i really shouldn't be seeing ANY tcp traffic, aside from >> volfile fetches/management on the IPoIB interface when using RDMA as a >> transport...right? As a result, from early tests (the bigger 10TB ones are >> running now), the tpc and rdma speeds were basically the same...when i >> would expect the RDMA one to be at least slightly faster... >> >> >> >> >> >> Oh, and this is all 3.4beta4, on both the clients and storage nodes. >> >> >> >> So, I guess my questions are: >> >> >> >> Is this expected/normal? >> >> Is peering/volfile fetching always tcp based? >> >> How should one peer nodes in a RDMA setup? >> >> Should this be tried with only RDMA as a transport on the volume? >> >> Are there more detailed docs for RDMA gluster coming w/ the 3.4 >> release? >> >> >> >> >> >> -- >> >> Matthew Nicholson >> >> Research Computing Specialist >> >> Harvard FAS Research Computing >> >> matthew_nicholson at harvard.edu >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> -- >> Open Source and Standards @ Red Hat >> >> twitter.com/realjustinclift >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130710/0101c93f/attachment-0001.html>