On 2/22/16 10:12 AM, David Casier wrote: > I have carried out tests very quickly and I have not had time to > concentrate fully on XFS. > maxpct =0.2 => 0.2% of 4To = 8Go > Because my existing ssd partitions are small > > If i'm not mistaken, and with what Dave says : > By default, data is written to 2^32 inodes of 256 bytes (= 1TiB). > With maxpct, you set the maximum size used by inodes, depending on the > percentage of disk Yes, that's reasonable, I just wanted to be sure. I hadn't seen it stated that your SSD was that small. Thanks, -Eric > 2016-02-22 16:56 GMT+01:00 Eric Sandeen <sandeen@xxxxxxxxxx>: >> On 2/21/16 4:56 AM, David Casier wrote: >>> I made a simple test with XFS >>> >>> dm-sdf6-sdg1 : >>> ------------------------------------------------------------------------------------------- >>> || sdf6 : SSD part || sdg1 : HDD (4TB) || >>> ------------------------------------------------------------------------------------------- >> >> If this is in response to my concern about not working on small >> filesystems, the above is sufficiently large that inode32 >> won't be ignored. >> >>> [root@aotest ~]# mkfs.xfs -f -i maxpct=0.2 /dev/mapper/dm-sdf6-sdg1 >> >> Hm, why set maxpct? This does affect how the inode32 allocator >> works, but I'm wondering if that's why you set it. How did you arrive >> at 0.2%? Just want to be sure you understand what you're tuning. >> >> Thanks, >> -Eric >> >>> [root@aotest ~]# mount -o inode32 /dev/mapper/dm-sdf6-sdg1 /mnt >>> >>> 8 directory with 16, 32, ..., 128 sub-directory and 16, 32, ..., 128 >>> files (82 bytes) >>> 1 xattr per dir and 3 xattr per file (user.cephosd...) >>> >>> 3 800 000 files and directory >>> 16 GiB was written on SSD >>> >>> ------------------------------------------------------ >>> || find | wc -l || >>> ------------------------------------------------------ >>> || Objects per dir || % IOPS on SSD || >>> ------------------------------------------------------ >>> || 16 || 99 || >>> || 32 || 100 || >>> || 48 || 93 || >>> || 64 || 88 || >>> || 80 || 88 || >>> || 96 || 86 || >>> || 112 || 87 || >>> || 128 || 88 || >>> ----------------------------------------------------- >>> >>> ------------------------------------------------------ >>> || find -exec getfattr '{}' \; || >>> ------------------------------------------------------ >>> || Objects per dir || % IOPS on SSD || >>> ------------------------------------------------------ >>> || 16 || 96 || >>> || 32 || 97 || >>> || 48 || 96 || >>> || 64 || 95 || >>> || 80 || 94 || >>> || 96 || 93 || >>> || 112 || 94 || >>> || 128 || 95 || >>> ----------------------------------------------------- >>> >>> It is true that filestore is not designed to make Big Data and the >>> cache must work inode / xattr >>> >>> I hope to see quiclky Bluestore in production :) >>> >>> 2016-02-19 18:06 GMT+01:00 Eric Sandeen <esandeen@xxxxxxxxxx>: >>>> >>>> >>>> On 2/15/16 9:35 PM, Dave Chinner wrote: >>>>> On Mon, Feb 15, 2016 at 04:18:28PM +0100, David Casier wrote: >>>>>> Hi Dave, >>>>>> 1TB is very wide for SSD. >>>>> >>>>> It fills from the bottom, so you don't need 1TB to make it work >>>>> in a similar manner to the ext4 hack being described. >>>> >>>> I'm not sure it will work for smaller filesystems, though - we essentially >>>> ignore the inode32 mount option for sufficiently small filesystems. >>>> >>>> i.e. if inode numbers > 32 bits can't exist, we don't change the allocator, >>>> at least not until the filesystem (possibly) gets grown later. >>>> >>>> So for inode32 to impact behavior, it needs to be on a filesystem >>>> of sufficient size (at least 1 or 2T, depending on block size, inode >>>> size, etc). Otherwise it will have no effect today. >>>> >>>> Dave, I wonder if we need another mount option to essentially mean >>>> "invoke the inode32 allocator regardless of filesystem size?" >>>> >>>> -Eric >>>> >>>>>> Exemple with only 10GiB : >>>>>> https://www.aevoo.fr/2016/02/14/ceph-ext4-optimisation-for-filestore/ >>>>> >>>>> It's a nice toy, but it's not something that is going scale reliably >>>>> for production. That caveat at the end: >>>>> >>>>> "With this model, filestore rearrange the tree very >>>>> frequently : + 40 I/O every 32 objects link/unlink." >>>>> >>>>> Indicates how bad the IO patterns will be when modifying the >>>>> directory structure, and says to me that it's not a useful >>>>> optimisation at all when you might be creating several thousand >>>>> files/s on a filesystem. That will end up IO bound, SSD or not. >>>>> >>>>> Cheers, >>>>> >>>>> Dave. >>>>> >>> >>> >>> >> > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html