Jamie Lokier wrote:
jim owens wrote:
Jamie Lokier wrote:
Writing in place or new-place on a *non-shared* (i.e. non-snapshotted)
file is the choice which is useful. It's a filesystem implementation
detail, not a semantic difference. I'm suggesting writing in place
may do no harm and be more like the expected behaviour with programs
that use O_DIRECT, which are usually databases.
How about a btrfs mount option?
in_place_write=never/always/direct_only. (Default direct_only).
The harm is creating a special guarantee for just one case
of "don't move my data" based on a transient file open mode.
What about defragmenting or moving the extent to another
device for performance or for (failing) device removal?
We are on a slippery slope for presumed expectations.
Don't make it a guarantee, just a hint to filesystem write strategy.
It's ok to move data around when useful, we're not talking about a
hard requirement, but a performance knob.
The question is just what performance and fragmentation
characteristics do programs that use O_DIRECT have?
They are nearly all databases, filesystems-in-a-file, or virtual
machine disks. I'm guessing virtually all of those _particular_
applications programs would perform significantly differently with a
write-in-place strategy for most writes, although you'd still want
access to the bells and whistles of snapshots and COW and so on when
requested.
Note I said differently :-) I'm not sure write-in-place performs
better for those sort of applications. It's just a guess.
I'm very certain that write-in-place performs much better
than cow because as we all know, doing storage allocation
is expensive. So many databases preallocate their files.
Oracle probably has a really good idea how it performs on ZFS compared
with a block device (which is always in place) - and knows whether ZFS
does in-place writes with O_DIRECT or not. Chris?
We only disagree how the rule to write-in-place is defined
and more importantly documented so it is easy to understand.
Btrfs allows each individual file to have "nodatacow" set
as an attribute. That is an easy rule to document for
the db admin. Much easier than "if nothing else takes
precedence to make it cow, O_DIRECT will write-in-place".
jim
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html