Theodore Tso wrote: > Let's take step back and ask ourselves what tools will want to do with > FIEMAP in the first place, shall we? > > As far as I know, it's basically only useful for bootloaders like lilo > and to a limited extent grub (for its stage2 loader) and for debugging > tools that are interested in knowing how fragmented a file might be. > I cant think of any other really good uses, anyway. Someone what to > enlighten me? Yes: 1. Databases. FIEMAP indicates where O_DIRECT will probably access. a. I/O strategy. Database engines can use this as hint to reduce seeks and increase speed of large or many concurrent queries. Merely trying to emit thousands of AIOs and letting the kernel elevator do it is not as good, as there are higher level optimisations possible, and in any case AIO and elevator limitations. b. The hints can also guide new data allocation, or reorgansation. 2. Filesystems in user space, e.g. NTFS-3G. See above. 3. Virtual machines use compact representations of large virtual disks. Some of them add COW capabilities. Both types are effectively filesystems-in-a-file. See above. 4. Programs which read data from lots of files, but don't care about the order, can reduce seeking if they can FIEMAP all the files and read the data in roughly block order (without getting too pedantic about it). E.g. something which indexes the content of of /home. (Related: See my (little used) "treescan" program which is sometimes much faster than "find" for scanning names and stat() information, due mostly to seek optimisation.) In all these uses, I notice that the _exact_ values are _not_ required. It is enough that they are usually accurate enough to use as I/O hints. It would make sense, I think, to merge this with the other work being done on I/O hints, for RAIDs and other media with sub-structure. > However, how many filesystems beyond resierfs3 actually will move a > file around on disk once it has been mapped to specific disk blocks > and written to disk? Does XFS does this? I didn't think so. If it > does, then for bootloaders like LILO it will also need a flag that > prevents a block from being moved around. Isn't "chattr +t" effectively a suitable generic flag for that, even though it doesn't exactly say so in the manual? Btw, I imagine quite a few future filesystems will move data around on disk once it is mapped. Probably not the majority. > There are however plenty of filesystems (XFS, ext4, etc.) that play > the delayed allocation game, where the FIEMAP information returned > could change from "location not yet determined on disk" to "here's > where we decided to put it on disk". And I assume that's what the > SYNC flag does, right? So it's really just syntactic sugar for doing > fsync; get fiemap; check to see if the an unmapped extent was still > returned (due to a race condition; if so, go back and repeat the fsync > and then retry the fiemap loop). I think you said two different things there. "Here's where we decided to put it it" is not the same as "we _have_ put it here". So sync is stronger than removing delalloc extents. (There's also a middle strength where data is all committed, but not necessarily atomically with getting all the extents at once). I'm not sure which semantics the XFS utilities need. If they don't access the raw blocks directly, they don't really need sync, they just need "here's where we decided to put it". If they do access raw blocks directly, they need that xfs_freeze stuff too, at which point it's using XFS ioctls anyway, so it begs the question of whether it should be using FIEMAP at all. > So I think perhaps the talking-at-cross-purposes is that Jim is > thinking about how to support filesystems that will in fact relocate > file data on disk (for example, as part of an online shrink or when > moving a file from one volume to another in a filesystem like advfs or > btrfs), and other folks have been assuming a simpler world where data > is either mapped to a location or disk or still in a delayed > allocation state. There was a flag FIEMAP_EXTENT_NO_DIRECT which should presumably be set on filesystems where data is not mapped at stable (or even single) blocks. That's why I suggested requiring that _not_ setting FIEMAP_EXTENT_NO_DIRECT (really, define it's complement!) should mean "the data is at this physical location _only while no process modifies to the file_". Filesystems with stable data locations, and some which move the file only when it's modified, could unset the flag. Other filesystems (maybe including BTRFS) would always set it. But that suggestion was not really understood at the time. Otherwise, if you think that no useful program will access the blocks directly, then why do we have !FIEMAP_EXTENT_NO_DIRECT at all? And what does it mean? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html