Switch recommendations

d.a.bretherton at reading.ac.uk (Dan Bretherton) · Tue, 31 Jan 2012 12:05:44 +0000

Hello Harald,
Thanks for taking an interest.  Answers are in line.

On 01/31/2012 02:12 AM, Harald St?rzebecher wrote:
> Are there some network admins or someone with networking experience
> you could talk to?
Only in the university's over stretched central IT Services department.  
I daren't distract them from doing various other things I'm waiting for 
them to do for me.

> Do you use the management functions that your current switches support?
Not at the moment.  I bought managed switches in case I had to set up a 
VLAN, but that turned out not to be necessary.
> How many server and clients do you have now?.
The 8 compute nodes in my Rocks cluster are the important ones.

>
> Do you plan to increase the numbers in the near future?
I might be buying another 4 compute nodes some time this year

> In that case I'd suggest to get stackable switches for easier expansion.
>
I agree that stackable switches are a good idea even if there is no 
immediate need for extra switches.
> How are the servers and clients distributed to the switches now?
The clients are all connected to one switch but some of the servers are 
connected to switches in adjacent racks.
>
> How are the switches connected to each other?
Single cables.  There aren't enough spare ports to connect the other 
rack switches via a LAG, so I am planning to buy a 48-port switch and 
connect all the clients and servers to that.
>
> Can you tell where your bottleneck is?
> Is it the connection between your switches or is it something else?
I'm pretty sure it's the connection between the switches.  I tested the 
application that has been causing the most concern on a conventional NFS 
server, which was connected to the main Rocks compute cluster switch.  I 
then connected the NFS server to a different switch and the application 
ran 5-6 times more slowly, even more slowly that when its data is on a 
GlusterFS volume in fact (when it only runs 3 times more slowly than 
conventional NFS...).  I don't think all the applications that run on 
the cluster are affected so severely, but this is the one my boss has 
heard about...
>
> Could you plug all the servers and some of the clients into one switch
> and the rest of the clients into the other switch(es) for a short
> period of time? There should be a big difference in speed between the
> two groups of clients. How does that compare to the speed you have
> now?
Good idea; I'll give that a try.

> What happens if one switch fails? Can the remaining equipment do some
> useful work until a replacement arrives?
> Depending on the answers it might be better to have two or more
> smaller stackable switches instead of one big switch, even if the big
> one might be cheaper.
I hadn't thought about redundancy I must admit, but buying two stackable 
24 port switches instead of one 48 port switch is an interesting idea. I 
would have to connect one set of servers to one switch and the GlusterFS 
replica servers to the other.  The clients would be distributed across 
both switches, and half of them would be able to connect to half the 
servers in the event of a switch failure.  One thing that worries me 
about this scenario is the time it would take to self-heal all the 
volumes after running with the replica servers missing for a day or 
two.  In theory GlusterFS should be able to cope with this, but there is 
a possibility of a server failing during the mammoth self-heal, and if 
were one of the up-to-date servers that was connected to the live switch 
when the other switch failed then the users would find themselves 
looking at old data.  The only way to avoid this would be to have two 
stackable 48-port switches.  I think I'll have to put this on my "nice 
to have" list.
>
> I don't have much experience with network administration so I cannot
> recommend a brand or type of switch.
>
> I just looked at the Netgear website and googled for prices:
> Two Netgear GS724TS stackable switches seem to cost nearly the same as
> one GS748TS, both are supposed to have "a 20 Gbps, dual-ring, highly
> redundant stacking bus".
I'll have a look at those.
>
>
>
> Regards,
>
> Harald
Thanks again.
Regards,
Dan.