Re: Fwd: Fwd: [newstore (again)] how disable double write WAL

David Casier <david.casier@xxxxxxxx> · Sun, 21 Feb 2016 11:56:46 +0100



I made a simple test with XFS

dm-sdf6-sdg1 :
-------------------------------------------------------------------------------------------
||  sdf6 : SSD part ||           sdg1 : HDD (4TB)                         ||
-------------------------------------------------------------------------------------------

[root@aotest ~]# mkfs.xfs -f -i maxpct=0.2 /dev/mapper/dm-sdf6-sdg1
[root@aotest ~]# mount -o inode32 /dev/mapper/dm-sdf6-sdg1 /mnt

8 directory with 16, 32, ..., 128 sub-directory and 16, 32, ..., 128
files (82 bytes)
1 xattr per dir and 3 xattr per file (user.cephosd...)

3 800 000 files and directory
16 GiB was written on SSD

------------------------------------------------------
||                 find | wc -l                   ||
------------------------------------------------------
|| Objects per dir || % IOPS on SSD ||
------------------------------------------------------
||           16         ||            99           ||
||           32         ||           100          ||
||           48         ||            93           ||
||           64         ||            88           ||
||           80         ||            88           ||
||           96         ||            86           ||
||          112        ||            87           ||
||          128        ||            88           ||
-----------------------------------------------------

------------------------------------------------------
||           find -exec getfattr '{}' \;         ||
------------------------------------------------------
|| Objects per dir || % IOPS on SSD ||
------------------------------------------------------
||           16         ||            96           ||
||           32         ||            97           ||
||           48         ||            96           ||
||           64         ||            95           ||
||           80         ||            94           ||
||           96         ||            93           ||
||          112        ||            94           ||
||          128        ||            95           ||
-----------------------------------------------------

It is true that filestore is not designed to make Big Data and the
cache must work inode / xattr

I hope to see quiclky Bluestore in production :)

2016-02-19 18:06 GMT+01:00 Eric Sandeen <esandeen@xxxxxxxxxx>:
>
>
> On 2/15/16 9:35 PM, Dave Chinner wrote:
>> On Mon, Feb 15, 2016 at 04:18:28PM +0100, David Casier wrote:
>>> Hi Dave,
>>> 1TB is very wide for SSD.
>>
>> It fills from the bottom, so you don't need 1TB to make it work
>> in a similar manner to the ext4 hack being described.
>
> I'm not sure it will work for smaller filesystems, though - we essentially
> ignore the inode32 mount option for sufficiently small filesystems.
>
> i.e. if inode numbers > 32 bits can't exist, we don't change the allocator,
> at least not until the filesystem (possibly) gets grown later.
>
> So for inode32 to impact behavior, it needs to be on a filesystem
> of sufficient size (at least 1 or 2T, depending on block size, inode
> size, etc). Otherwise it will have no effect today.
>
> Dave, I wonder if we need another mount option to essentially mean
> "invoke the inode32 allocator regardless of filesystem size?"
>
> -Eric
>
>>> Exemple with only 10GiB :
>>> https://www.aevoo.fr/2016/02/14/ceph-ext4-optimisation-for-filestore/
>>
>> It's a nice toy, but it's not something that is going scale reliably
>> for production.  That caveat at the end:
>>
>>       "With this model, filestore rearrange the tree very
>>       frequently : + 40 I/O every 32 objects link/unlink."
>>
>> Indicates how bad the IO patterns will be when modifying the
>> directory structure, and says to me that it's not a useful
>> optimisation at all when you might be creating several thousand
>> files/s on a filesystem. That will end up IO bound, SSD or not.
>>
>> Cheers,
>>
>> Dave.
>>


-- 

________________________________________________________

Cordialement,

David CASIER


3B Rue Taylor, CS20004
75481 PARIS Cedex 10 Paris

Ligne directe: 01 75 98 53 85
Email: david.casier@xxxxxxxx
________________________________________________________
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html