jim owens wrote: > Jamie Lokier wrote: > > > >Writing in place or new-place on a *non-shared* (i.e. non-snapshotted) > >file is the choice which is useful. It's a filesystem implementation > >detail, not a semantic difference. I'm suggesting writing in place > >may do no harm and be more like the expected behaviour with programs > >that use O_DIRECT, which are usually databases. > > > >How about a btrfs mount option? > >in_place_write=never/always/direct_only. (Default direct_only). > > The harm is creating a special guarantee for just one case > of "don't move my data" based on a transient file open mode. > > What about defragmenting or moving the extent to another > device for performance or for (failing) device removal? > > We are on a slippery slope for presumed expectations. Don't make it a guarantee, just a hint to filesystem write strategy. It's ok to move data around when useful, we're not talking about a hard requirement, but a performance knob. The question is just what performance and fragmentation characteristics do programs that use O_DIRECT have? They are nearly all databases, filesystems-in-a-file, or virtual machine disks. I'm guessing virtually all of those _particular_ applications programs would perform significantly differently with a write-in-place strategy for most writes, although you'd still want access to the bells and whistles of snapshots and COW and so on when requested. Note I said differently :-) I'm not sure write-in-place performs better for those sort of applications. It's just a guess. Oracle probably has a really good idea how it performs on ZFS compared with a block device (which is always in place) - and knows whether ZFS does in-place writes with O_DIRECT or not. Chris? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html