Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hugh Dickins wrote:
> On Thu, 12 Jul 2012, Jeff Liu wrote:
> > On 07/12/2012 07:01 AM, Dave Chinner wrote:
> > > On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
> > >>
> > >> But your vote would count for a lot more if you know of some app which
> > >> would really benefit from this functionality in tmpfs: I've heard
> > >> of none.
...
[Jeff mentioned "cp"]

grep is another tool that would benefit.
I often put very large files (often sparse, too) on tmpfs file systems
and would like "grep -r PAT /tmp" to work well in spite of those files.

Please consider restoring SEEK_HOLE/SEEK_DATA support for tmpfs.

The lack of cross-FS support in SEEK_HOLE/SEEK_DATA support is a bit of a
thorn in our sides.  FIEMAP is not a viable option, and SEEK_HOLE support
works only if you happen to be using btrfs, xfs, ocfs2 or 3.5.0-rcN tmpfs.
Not something we can rely on for a feature whose lack can convert grep -r
into a memory-hogging apparently-hung job or OOM-killer-target.

What would you like to happen when you run
(deliberately or inadvertently) grep on a large sparse file?
I want it to search only the non-HOLE sections of that file,
especially when examining a hole involves accumulating a
"line" that may be so long that it exhausts virtual memory.
We're not quite there, but for now can at least avoid the
VM-abusing behavior with --binary-file=without-match option,
which says to treat "binary" (sparse) files as if they contain no match.
Sometimes.

With working SEEK_HOLE support, grep does the right thing here:

    (${AWK-awk} 'BEGIN{ for (i=0;i<1000;i++) printf "%080d\n", 0 }' < /dev/null
     echo x | dd bs=1024k seek=8000000
    ) >8T-or-so

    $ env time --format=%e grep x 8T-or-so
    0.00

But without SEEK_HOLE support, and with a lot of memory, grep takes a
long time to allocate all of that space before it finally chokes or is killed.
Here, it takes 46 seconds before running out of memory:

    $ env time grep --binary-file=without-match x 8T-or-so
    grep: memory exhausted
    3.15user 25.48system 0:46.46elapsed 61%CPU\
      (0avgtext+0avgdata 12583712maxresident)k
    0inputs+8outputs (0major+2733623minor)pagefaults 0swaps
    [Exit 2]

Until very recently, grep was trying to guess whether an input
has a hole using st_blocks and st_size, but with file systems now
using compression, that method it too subject to false-positives.

Ideally we would use SEEK_HOLE/SEEK_DATA, but until that is useful on
more linux file systems, I suspect we'll have to choose our method based
on the file system type (at the cost of a statvfs call for each st_dev),
possibly in combination with the linux kernel version.

Here's some background/discussion on the topic, including the
original report about the st_blocks-based heuristic not working:

    http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4604/focus=4610

In case you want to see the SEEK_HOLE-using code, grep's file_is_binary
function is here:

    http://git.savannah.gnu.org/cgit/grep.git/tree/src/main.c#n439
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux