I made a simple test with XFS dm-sdf6-sdg1 : ------------------------------------------------------------------------------------------- || sdf6 : SSD part || sdg1 : HDD (4TB) || ------------------------------------------------------------------------------------------- [root@aotest ~]# mkfs.xfs -f -i maxpct=0.2 /dev/mapper/dm-sdf6-sdg1 [root@aotest ~]# mount -o inode32 /dev/mapper/dm-sdf6-sdg1 /mnt 8 directory with 16, 32, ..., 128 sub-directory and 16, 32, ..., 128 files (82 bytes) 1 xattr per dir and 3 xattr per file (user.cephosd...) 3 800 000 files and directory 16 GiB was written on SSD ------------------------------------------------------ || find | wc -l || ------------------------------------------------------ || Objects per dir || % IOPS on SSD || ------------------------------------------------------ || 16 || 99 || || 32 || 100 || || 48 || 93 || || 64 || 88 || || 80 || 88 || || 96 || 86 || || 112 || 87 || || 128 || 88 || ----------------------------------------------------- ------------------------------------------------------ || find -exec getfattr '{}' \; || ------------------------------------------------------ || Objects per dir || % IOPS on SSD || ------------------------------------------------------ || 16 || 96 || || 32 || 97 || || 48 || 96 || || 64 || 95 || || 80 || 94 || || 96 || 93 || || 112 || 94 || || 128 || 95 || ----------------------------------------------------- It is true that filestore is not designed to make Big Data and the cache must work inode / xattr I hope to see quiclky Bluestore in production :) 2016-02-19 18:06 GMT+01:00 Eric Sandeen <esandeen@xxxxxxxxxx>: > > > On 2/15/16 9:35 PM, Dave Chinner wrote: >> On Mon, Feb 15, 2016 at 04:18:28PM +0100, David Casier wrote: >>> Hi Dave, >>> 1TB is very wide for SSD. >> >> It fills from the bottom, so you don't need 1TB to make it work >> in a similar manner to the ext4 hack being described. > > I'm not sure it will work for smaller filesystems, though - we essentially > ignore the inode32 mount option for sufficiently small filesystems. > > i.e. if inode numbers > 32 bits can't exist, we don't change the allocator, > at least not until the filesystem (possibly) gets grown later. > > So for inode32 to impact behavior, it needs to be on a filesystem > of sufficient size (at least 1 or 2T, depending on block size, inode > size, etc). Otherwise it will have no effect today. > > Dave, I wonder if we need another mount option to essentially mean > "invoke the inode32 allocator regardless of filesystem size?" > > -Eric > >>> Exemple with only 10GiB : >>> https://www.aevoo.fr/2016/02/14/ceph-ext4-optimisation-for-filestore/ >> >> It's a nice toy, but it's not something that is going scale reliably >> for production. That caveat at the end: >> >> "With this model, filestore rearrange the tree very >> frequently : + 40 I/O every 32 objects link/unlink." >> >> Indicates how bad the IO patterns will be when modifying the >> directory structure, and says to me that it's not a useful >> optimisation at all when you might be creating several thousand >> files/s on a filesystem. That will end up IO bound, SSD or not. >> >> Cheers, >> >> Dave. >> -- ________________________________________________________ Cordialement, David CASIER 3B Rue Taylor, CS20004 75481 PARIS Cedex 10 Paris Ligne directe: 01 75 98 53 85 Email: david.casier@xxxxxxxx ________________________________________________________ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html