Re: Beagle and logging inotify events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl wrote:
On 11/14/07, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
Jon Smirl wrote:
On 11/14/07, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
Is it feasible to do something like this in the linux file system
architecture?

Beagle beats on my disk for an hour when I reboot. Of course I don't
like that and I shut Beagle off.
Leopard, by the way, does exactly this: it has a daemon that starts
at boot time and taps FSEvents then journals file system changes to a
well-known file on local disk.
Logging file systems have all of the needed info. Plus they know what
is going on with rollback/replay after a crash.
True, but not all file systems have a journal.  Consider ext2 or FAT32,
both of which are still common.

ext2/FAT32 can use the deamon approach you describe below which also
works as a short term solution. The Beagle people do have a deamon but
it can be turned off. Holes where you don't record the inotify events
and update the index are really bad because they can make files that
you know are on the disk disappear from the index.  I don't believe
Beagle distinguishes between someone turning it off for a day and then
turning it back on, vs a reboot. In both cases it says there was a
window where untracked changes could have happened and it triggers a
full rescan.

The root problem here is needing a bullet proof inotify log with no
windows.

I disagree: we don't need a "bullet-proof" log. We can get a significant performance improvement even with a permanent dnotify log implemented in user-space. We already have well-defined fallback behavior if such a log is missing or incomplete.

The problem with a permanent inotify log is that it can become unmanageably enormous, and a performance problem to boot. Recording at that level of detail makes it more likely that the logger won't be able to keep up with file system activity.

A lightweight solution gets us most of the way there, is simple to implement, and doesn't introduce many new issues. As long as it can tell us precisely where the holes are, it shouldn't be a problem.

The only place that is going to happen is inside the file
system logs.

As Andi points out, existing block-based journaling implementations won't easily provide this. And most fs journals are actually pretty limited in size.

Alternately, you could insert a stackable file system layer between the VFS and the on-disk fs to provide more seamless information about updates.

We just need an API to say recreate the inotify stream
from this checkpoint forward. Things like FAT/ext2 will always return
a no data available error from this API.

How about a fs API
where Beagle has a token for a checkpoint, and then it can ask for a
recreation of inotify events from that point forward.  It's always
possible for the file system to say I can't do that and trigger a full
rebuild from Beagle. Daemons that aren't coordinated with the file
system have a window during crash/reboot where they can get confused.
A reasonably effective solution can be implemented in user space without
changes to the file system APIs or implementations.  IOW we already have
the tools to make something useful.

For example, you don't need to record every file system event to make
this useful.  Listing only directory-level changes (ie "some file in
this directory has changed") is enough to prune most of Beagle's work
when it starts up.

Without low level support like this Beagle is forced to do a rescan on
every boot. Since I crash my machine all of the time the disk load
from rebooting is intolerable and I turn Beagle off. Even just turning
the machine on in the morning generates an annoyingly large load on
the disk.
Understood.  The need is clear.

My Dad's WinXP system takes 10 minutes after every start-up before it's
usable, simply because the virus scanner has to check every file in the
system.  Same problem!

I don't see why this couldn't be done on Linux as well.

---------- Forwarded message ----------
From: Jon Smirl <jonsmirl@xxxxxxxxx>
Date: Nov 13, 2007 4:44 PM
Subject: Re: Strange "beagle" interaction..
To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, Junio C Hamano
<gitster@xxxxxxxxx>, Git Mailing List <git@xxxxxxxxxxxxxxx>, Johannes
Schindelin <Johannes.Schindelin@xxxxxx>


On 11/13/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
On Tue, 13 Nov 2007, J. Bruce Fields wrote:
Last I ran across this, I believe I found it was adding extended
attributes to the file.
Yeah, I just straced it and found the same thing. It's saving
fingerprints
and mtimes to files in the extended attributes.
Things like Beagle need a guaranteed log of global inotify events.
That would let them efficiently find changes made since the last time
they updated their index.

Right now every time Beagle starts it hasn't got a clue what has
changed in the file system since it was last run. This forces Beagle
to rescan the entire filesystem every time it is started. The xattrs
are used as cache to reduce this load somewhat.

A better solution would be for the kernel to log inotify events to
disk in a manner that survives reboots. When Beagle starts it would
locate its last checkpoint and then process the logged inotify events
from that time forward. This inotify logging needs to be bullet proof
or it will mess up your Beagle index.

Logged files systems already contain the logged inotify data (in their
own internal form). There's just no universal API for retrieving it in
a file system independent manner.

Yeah, I just turned off beagle.  It looked to me like it was doing
something wrongheaded.
Gaah. The problem is, setting xattrs does actually change ctime.
Which
means that if we want to make git play nice with beagle, I guess
we have
to just remove the comparison of ctime.

Oh, well. Git doesn't *require* it, but I like the notion of
checking the
inode really really carefully. But it looks like it may not be an
option,
because of file indexers hiding stuff behind our backs.

Or we could just tell people not to run beagle on their git trees,
but I
suspect some people will actually *want* to. Even if it flushes
their disk
caches.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Jon Smirl
jonsmirl@xxxxxxxxx


--
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-
fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com








begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux