A key factor is the need for >2TB file systems that can be snapshot and reverted quickly. We have other FC arrays attached to compute nodes without this requirement, and they have XFS directly on the FC logical volumes made accessible to native nodes and VM nodes via RDM.
Our FC arrays do not have native snapshot features, so we must use a software layer whether that is Linux LVM, ESXi, or something else. And because of our unique usage patterns and constraints, we have settled on VMware over other virtualization technologies. We are using ESXi (free version) but can upgrade to ESX if necessary. However, the upgrade wouldn't fix the 2TB snapshot limit.
We are certainly not in the true HPC realm, but we do have about 20 physical compute nodes that do both random and sequential I/O. An example query might identify a 10-500GB data set comprised of 100-500KB files. Some work sets are processor bound with disk I/O accounting for less than 5%. However, others are spending about 50% on disk I/O, so improving performance would be helpful - again in the context of the snapshot requirement.
Point well understood about the risks of striping multiple 2TB VMDK files together. But because of the constraints, it's either 2TB VMDK's or 2TB RDM's in virtual compatibility mode, and they both seem about equally risky. Do you have better suggestions?
Back to XFS, in this context, is there any benefit in tuning some parameters to get better performance, or will it all just be overshadowed by poor performance of the VMDKs that tuning isn't worthwhile?
Jan.
On Thu, Mar 28, 2013 at 8:56 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
On 3/28/2013 4:45 PM, Ralf Gross wrote:
> Stan Hoeppner schrieb:
So 2TB is the kicker here. I haven't used ESX since 3.x, and none of
> Snapshots are possible with RDM in virtual compatibily mode, not
> physical mode (> 2 TB).
our RDMs back then were close to 2TB. IIRC our largest was 500GB.
If you drill down through that you find this:
>> VMFS volumes are not intended for high performance IO. Unless things
>> have changed recently, VMware has always recommended housing only OS
>> images and the like in VMDKs, not user data. They've always recommended
>> using RDMs for everything else. IIRC VMDKs have a huge block (sector)
>> size, something like 1MB. That's going to make XFS alignment difficult,
>> if not impossible.
>
> I can't remember that I've every found this recommendation on a vmware
> page.
>
> http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html
http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf
RDMs have better large sequential performance, and lower CPU burn than
VMDKs. The OP mentioned "compute node" in his post, which suggests an
HPC application workload, which suggests large sequential IO.
Also note that VMware is Microsoft centric so they always run their
tests using an MS Server guest. Also note they always test with tiny
volumes, in this case 20GB. NTFS isn't going to have any trouble at
this size, but at say 20TB it probably will and these published results
would likely be quite different at that scale. XFS performance
characteristics on a 2TB or 20TB or ?? TB volume will likely be
substantially different than NTFS. Their tests show 5-8% lower CPU burn
for RDM vs VMDK. Not a huge difference, but again they're testing only
20GB.
And more and more folks are using midrange FC/iSCSI arrays that don't
>> I cannot stress emphatically enough that you should not stitch 2TB VMDKs
>> together and use them in the manner you described. This is a recipe for
>> disaster. Find another solution.
>
> I'm seeing more and more requests for VMs with large disks lately in my
> env. Right now the max. is ~2 TB. I'm also thinking about where to go,
> > 2 TB ist only possible with pRDMs which can't be snapshotted. You
> have to use the snapshot features of your storage array.
have snapshot features, others are using DAS with RAID HBAs, in both
cases forcing them to rely on ESX snapshots. Sounds like VMware needs
to bump this artificial 2TB limit quite a bit higher.
--
Stan
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs