Re: RAID performance

Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> · Fri, 08 Feb 2013 17:15:18 +1100

On 08/02/13 10:48, Chris Murphy wrote:
> 
> ***On Feb 7, 2013, at 6:08 AM, Adam Goryachev
> <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Basically, on occasion, when a user copies a large file from disk
>> to disk, or when a user is using Outlook (frequently data files are
>> over 2G), or just general workload, the system will "stall",
>> sometimes causing user level errors, which mostly affects Outlook.
> 
> Does this concern anyone else? In particular the user doing "disk to
> disk" large file copies. What is this exactly? LV to LV with iSCSI
> over 1gigE? Why did you reject NFS for these physical Windows boxes
> and their VMs to access this storage, rather than what I assume is
> NTFS over iSCSI, because of this statement?

This isn't a common thing (well, it happens once a week when a user logs
in after hours to do some sort of backup/DB maintenance), but it is the
easiest way to reproduce the problem, and from the evidence, it seems to
match.

ie, generally the problem is characterized as:
1) Large amount of read and write on one iSCSI device
2) User complain about write failures, slow response, etc even when 1
and 2 are on different VM's (which are on different physical machines).

>> Each LV is then exported via iSCSI
> That block device needs a file system for Windows to use it.
> 
> It also seems to me one or more of these physical servers running
> VMs, with only 1gigE to the storage server, need either additional
> pipes LACP or bonded ethernet, or 10gigE. I can just imagine one
> person doing a large file copy disk to disk, which is a single pipe
> doing a pull push, double NTFS packet overhead, while all other
> activities get immensely hit with network latency as a result.

However, this should only cause issues for users on the server which is
doing this. ie, if a user logs into terminal server 1, and copies a
large file from the desktop to another folder on the same c:, then this
terminal server will get busy, possibly using a full 1Gbps through the
VM, physical machine, switch, to the storage server. However, the
storage server has another 3Gbps to serve all the other systems. Also,
100MB/s is not an unreasonable performance level for a single system
(ok, minus overhead, even 60MB/s would probably equal what they had
before with 10 year old SCSI disks).

> ###On Feb 7, 2013, at 4:07 AM, Dave Cundiff <syshackmin@xxxxxxxxx>
> wrote:
> 
>> See page 17 for a block diagram of your motherboard… Your SSDs 
>> alone could saturate that if you performed a local operation. Get
>> your NIC's going at 4Gig and all of it a sudden you'll really want
>> that SATA card in slot 4 or 5.
> 
> Yeah I think it needs all the network performance and reduced latency
> as he can get. I'll be surprised if the SSD tuning alone makes much
> of a dent with this.

I still need to go in (tomorrow night) and pull apart the machine
physically to confirm which slot the network cards are in, but based on
the other comments, I don't think this is the limiting factor.... Slap
me if it is and I'll drive in tonight and check it sooner.

Thanks,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html