Re: [PATCH v3 0/3] Add XIP support to ext4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 17, 2013 at 07:31:43PM -0700, Matthew Wilcox wrote:
> On Wed, Dec 18, 2013 at 09:30:50AM +1100, Dave Chinner wrote:
> > No, you haven't addressed the problem. There is nothing in this
> > patch set that converts an unwritten extent after it is written to.
> > Hence on every subsequent read will return zeros because the block
> > is still marked as unwritten.
> 
> I don't understand.  Here's the path as I understand it:
> 
> xip_file_write -> __xip_file_write -> ext4_get_xip_mem(create=0),
> returns -ENODATA.  So we call ext4_get_xip_mem again, this time with
> create=1 which causes ext4_get_block() to allocate blocks.

When Dave says that the extent is unwritten, what he means is that the
block as been allocated, but it is marked as being uninitialized.
Since the block is uninitialized we must not read from that block;
instead, if the user issues a read request to an uninitialized block,
we must return all zero's for that block (lest we reveal stale data).
And if we try to write to an uninitialized block, *after* we write to
the data block, we have to clear the uninitalized block, which in some
cases might mean splitting the extent --- if we have an extent which
maps logical blocks 0 to 5 to physical blocks 100 to 105, and we write
to block #2, will need to change that single uninitialized extent to
three extents --- one covering blocks logical blocks 0-1, one covering
logical block 2, and one covering logical blocks 3-5, where the first
and third would be marked uninitialized, and the second would be
marked initialized.  Since we potentially need to convert one extent
to three extents, this might involve an extent tree node split.

You keep talking about allocated vs unallocated, and create=0 and
create=1, but even for an allocated block, that block may be marked
initialized or uninitialized --- and if it is marked uninitialized,
xip_file_write must call a file system-specific callback to allow this
conversion to take place.

Now for persistent memory, if writes are infinitely fast and free
(i.e., there are no significant write cycle limitations), there is a
terrible, terrible hack we could do, which would involve fallocate()
never creating uninitialized extents, but instead writing zero's to
all of the allocated blocks.  And whenever we open a file, before we
return a file descriptor, we could scan all of the extent blocks
looking for uninitialized extents, and if so, convert them to
initialized extents by writing zeros to all of the blocks at open
time.

This would be horribly slow for hard drives, but typically the devices
that use XIP are either read-only, or fairly fast.  It's really ugly,
and if it turns out that persistent memory is not infinitely fast and
have infinite write cycles (and my understanding is that persistent
memory is either extraordinarily expensive, or not as perfect in terms
speed and wear resistenace as some of their boosters claim), it's not
clear that trying to evade dealing with uninitialized extents is a
smart way to go.

In other words, suppose somone calls fallocate on a 2GB region on an
XIP mounted file system.  Would you be happy forcing 2GB's worth of
writes at fallocate time(), just because we don't want to deal with
adding a file system callback in xip_file_write()?

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux