Re: RAID performance - new kernel results

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Adam,

Thanks for all your posting and continued updates on your travails
about getting the performance on your Xen systems and storage.  A
complete eye-opener in alot of ways.  

Now, from looking over your report, it strongly smells of a problem in
the network switch.  I think you have just one in the core of your
network, correct?  I'd probably try to bring up a test network (if you
have the spare systems in a lab) and try to replicate the packet
drops.

But in general, I'd probably:

  - remove the iSCSI bonding, goto a single 1Gb link.
  - get rid of jumbo frames, if you're using it.
  - can you reduce the size of your bond0 on the storage box?  

I wonder if the switch is having some sort of table over-flow, or is
just having some sort of brain fart and droppping a packet and then
needs time to rebuild it's tables internally to get things going
again?

I'd try to borrow a similar sized switch from another vendor and try
using that instead if you can.  Another thing is to try and use SNMP
to grab stats from the switch and look for patterns.  When you see
connectivity problems, do you see a corresponding drop on one of the
links on the bond0 connection?  Or on another bond?  

But, thinking about it more, you don't mention if you're dropping
packets on the iSCSI side of things, or just on the regular network.
That's a key observation, since it will either suggest, or refute my
idea of the problem being in the bond(s).  

Do you see any errors in the dmesg logs on the Xen/Linux/Windows
boxes?  And when you have an outage between two hosts, do pings to
*other* hosts still work just fine, or does all network traffic on
that host come to a stop?

It really smells of a switch problem.  Have you checked that the
switch firmware is upto date?  It might just be that Netgear makes a
crappy switch (cue people to chime on on this! :-) which can't handle
the load you're tossing at it.  Which is why I suggest you try another
vendor's switch.

Cisco is probably reliable but expensive.  Dell has some ok switches
in my experience, but nothing recent.  I've heard good things about
other brands such as Juniper, Force10 (now Dell) and others.  

Please keep posting, it's great information for the rest of us to keep
in the back of our heads.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux