On 07/10/2013 11:51 AM, Joe Landman wrote: > On 07/10/2013 02:36 PM, Joe Julian wrote: > >> 1) http://www.solarflare.com makes sub microsecond latency adapters that >> can utilize a userspace driver pinned to the cpu doing the request >> eliminating a context switch > > We've used open-onload in the past on Solarflare hardware. And with > GlusterFS. > > Just say no. Seriously. You don't want to go there. Bummer. That sounded like an interesting idea. > >> 2) http://www.aristanetworks.com/en/products/7100t is a 2.5 microsecond >> switch > > Neither choice will impact overall performance much for GlusterFS, > even in heavily loaded situations. > > What impacts performance more than anything else is node/brick design, > implementation, and specific choices in that mix. Storage latency, > bandwidth, and overall design will be more impactful than low latency > networking. Distribution, kernel and filesystem choices (including > layout, lower level features, etc.) will matter significantly more > than low latency networking. You can completely remove the networking > impact by trying your changes out on localhost, and seeing what the > impact your design changes have. > > If you don't start out with a fast box, you are not going to have fast > aggregated storage. This observation has not changed since the pre > 2.0 GlusterFS days (its as true today as it was years ago). > The "small file" complaint is all about latency though. There's very little disk overhead (all inode lookups) to doing a self-heal check. "ls -l" on a 50k file directory and nearly all the delay is from network RTT for self-heal checks (check that with wireshark).