Re: xfs + 100TB+ storage + lots of small files + NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 09, 2016 at 01:14:37PM +0200, Marcin Sura wrote:
> Hi,
> 
> Friend of mine asked me about evaluation of XFS for their purposes.
> Currently I don't have physical access to their system, but here are the
> info I've got so far:
> 
> SAN:
> - physical storage is from FSC array, thin provisioned raid 6 volume,
> - volumes are 100TB+ in size
> - there are SSD disks in the array, which potentially can be used for
> journal
> - storage is connected to the host via 10GbE iSCSI
> 
> Host:
> - They are using CentOS6.5, with stock kernel 2.6.32-*

I'd suggest the NFS server should use a kernel/distro as recent as
possible. Doesn't affect client/application side OS choices, so
I would suggest, at minimum, you use a kernel that supports metadata
CRCs. You'll have hundreds of terabytes of data indexed by hundreds
of gigabytes of metadata, and you're going to want things like free
inode indexing to keep inode allocation as fast as possible as
counts build up to the hundreds of millions of inodes.

> - System uses all default values, no optimization has beed done
> - OS installed on SSD
> - Don't know exact details of CPU, but I assume some recent multicore CPU
> - Don't know amount of RAM installed, I assume 32GB+

With a peaky random read workload, you're going to want to cache
tens of millions of inodes in RAM to get performance out of the
machine. RAM is cheap compared to storage costs - I'd suggest
hundreds of GB of RAM in the server....

> NFS:
> - they are exporting filesystem via NFS to 10-20 clients (services), some
> VMs, some bare metal
> - clients are connected via 1GbE or 10GbE links
> 
> Workload:
> - they are storing tens or hundreds of millions of small files
> - files are not in single directory

How big are the directories?

> - files are undek 1K, usually 200 - 500 bytes
> - I assume, that some NFS clients constantly write files
> - some NFS clients initiates massive reads, millions of random files
> - those reads are on demand, but during peak hours there can be many of
> such requests

This sort of "efficiently indexing hundreds of millions of tiny
objects" workload is what databases were designed for, not
filesystems. Yes, you can use a filesystem for this sort of
workload, but IMO it's not the right tool for this job.

> So far they were using Ext4, after some basic test they observed 40%
> improvement in application counters. But I'm afraid that those tests were
> done in environment not even close to the production (not so big size of
> filesystem, not so much files).
> 
> I want to ask you what would be best mkfs.xfs settings for such setup.

How long is a piece of string?

Working out how to optimise storage to this sort of workload
requires an iterative measure/analyse/tweak approach. Anything else
is just guesswork. i.e. start with the defaults, then measure
performance, identify the bottlenecks in the system and then tweak
the appropriate knob to alleviate the bottleneck.

i.e. you may find that there are things you have to change in the
NFS server config to get it to scale before you even start looking
at XFS performance....

> I assume, that they should use inode64 mount option for such large
> filesystem with that amount of files, but I'm a bit worried about
> compatibility with NFS (default shipped with CentOS 6.5). I think inode32
> is totally out of scope here.

inode32 will not support hundreds of millions of inodes - you'll
ENOSPC the first AG long before that, and performance will be very
bad as all inode/directory allocation will single thread. And it
will only get worse as the inode count goes up.

As it is, inode64 will be fine for 64bit NFS clients. It's only 32
bit clients that have problems with 64 bit inode numbers, and even
then it is only a problem on older linux and non-linux clients.

> Also, do you know any benchmark which can be used to simulate such
> workload?

Test against your production workload. It's the only way to be sure
you are optimising the right things.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux