Re: [PATCH 0/4] Fiemap, an extent mapping ioctl - round 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Theodore Tso wrote:
> Let's take step back and ask ourselves what tools will want to do with
> FIEMAP in the first place, shall we?
> 
> As far as I know, it's basically only useful for bootloaders like lilo
> and to a limited extent grub (for its stage2 loader) and for debugging
> tools that are interested in knowing how fragmented a file might be.
> I cant think of any other really good uses, anyway.  Someone what to
> enlighten me?

Yes:

   1. Databases.  FIEMAP indicates where O_DIRECT will probably access.

      a. I/O strategy.  Database engines can use this as hint to
         reduce seeks and increase speed of large or many concurrent
         queries.  Merely trying to emit thousands of AIOs and letting
         the kernel elevator do it is not as good, as there are higher
         level optimisations possible, and in any case AIO and
         elevator limitations.

      b. The hints can also guide new data allocation, or reorgansation.

   2. Filesystems in user space, e.g. NTFS-3G.  See above.

   3. Virtual machines use compact representations of large virtual
      disks.  Some of them add COW capabilities.  Both types are
      effectively filesystems-in-a-file.  See above.

   4. Programs which read data from lots of files, but don't care
      about the order, can reduce seeking if they can FIEMAP all the
      files and read the data in roughly block order (without getting
      too pedantic about it).  E.g. something which indexes the
      content of of /home.  (Related: See my (little used) "treescan"
      program which is sometimes much faster than "find" for scanning
      names and stat() information, due mostly to seek optimisation.)

In all these uses, I notice that the _exact_ values are _not_ required.
It is enough that they are usually accurate enough to use as I/O
hints.

It would make sense, I think, to merge this with the other work being
done on I/O hints, for RAIDs and other media with sub-structure.

> However, how many filesystems beyond resierfs3 actually will move a
> file around on disk once it has been mapped to specific disk blocks
> and written to disk?  Does XFS does this?  I didn't think so.  If it
> does, then for bootloaders like LILO it will also need a flag that
> prevents a block from being moved around.

Isn't "chattr +t" effectively a suitable generic flag for that, even
though it doesn't exactly say so in the manual?

Btw, I imagine quite a few future filesystems will move data around on
disk once it is mapped.  Probably not the majority.

> There are however plenty of filesystems (XFS, ext4, etc.) that play
> the delayed allocation game, where the FIEMAP information returned
> could change from "location not yet determined on disk" to "here's
> where we decided to put it on disk".  And I assume that's what the
> SYNC flag does, right?  So it's really just syntactic sugar for doing
> fsync; get fiemap; check to see if the an unmapped extent was still
> returned (due to a race condition; if so, go back and repeat the fsync
> and then retry the fiemap loop).

I think you said two different things there.  "Here's where we decided
to put it it" is not the same as "we _have_ put it here".  So sync is
stronger than removing delalloc extents.  (There's also a middle
strength where data is all committed, but not necessarily atomically
with getting all the extents at once).

I'm not sure which semantics the XFS utilities need.  If they don't
access the raw blocks directly, they don't really need sync, they just
need "here's where we decided to put it".  If they do access raw
blocks directly, they need that xfs_freeze stuff too, at which point
it's using XFS ioctls anyway, so it begs the question of whether it
should be using FIEMAP at all.

> So I think perhaps the talking-at-cross-purposes is that Jim is
> thinking about how to support filesystems that will in fact relocate
> file data on disk (for example, as part of an online shrink or when
> moving a file from one volume to another in a filesystem like advfs or
> btrfs), and other folks have been assuming a simpler world where data
> is either mapped to a location or disk or still in a delayed
> allocation state.

There was a flag FIEMAP_EXTENT_NO_DIRECT which should presumably be
set on filesystems where data is not mapped at stable (or even single)
blocks.

That's why I suggested requiring that _not_ setting
FIEMAP_EXTENT_NO_DIRECT (really, define it's complement!) should mean
"the data is at this physical location _only while no process modifies
to the file_".  Filesystems with stable data locations, and some which
move the file only when it's modified, could unset the flag.  Other
filesystems (maybe including BTRFS) would always set it.  But that
suggestion was not really understood at the time.

Otherwise, if you think that no useful program will access the blocks
directly, then why do we have !FIEMAP_EXTENT_NO_DIRECT at all?  And
what does it mean?

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux