Re: Fwd: Fwd: [newstore (again)] how disable double write WAL

Eric Sandeen <esandeen@xxxxxxxxxx> · Fri, 19 Feb 2016 11:06:25 -0600

On 2/15/16 9:35 PM, Dave Chinner wrote:
> On Mon, Feb 15, 2016 at 04:18:28PM +0100, David Casier wrote:
>> Hi Dave,
>> 1TB is very wide for SSD.
> 
> It fills from the bottom, so you don't need 1TB to make it work
> in a similar manner to the ext4 hack being described.

I'm not sure it will work for smaller filesystems, though - we essentially
ignore the inode32 mount option for sufficiently small filesystems.

i.e. if inode numbers > 32 bits can't exist, we don't change the allocator,
at least not until the filesystem (possibly) gets grown later.

So for inode32 to impact behavior, it needs to be on a filesystem 
of sufficient size (at least 1 or 2T, depending on block size, inode
size, etc). Otherwise it will have no effect today.

Dave, I wonder if we need another mount option to essentially mean
"invoke the inode32 allocator regardless of filesystem size?"

-Eric

>> Exemple with only 10GiB :
>> https://www.aevoo.fr/2016/02/14/ceph-ext4-optimisation-for-filestore/
> 
> It's a nice toy, but it's not something that is going scale reliably
> for production.  That caveat at the end:
> 
> 	"With this model, filestore rearrange the tree very
> 	frequently : + 40 I/O every 32 objects link/unlink."
> 
> Indicates how bad the IO patterns will be when modifying the
> directory structure, and says to me that it's not a useful
> optimisation at all when you might be creating several thousand
> files/s on a filesystem. That will end up IO bound, SSD or not.
> 
> Cheers,
> 
> Dave.
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html