Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the query-fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+       time which is stored as the seconds elapsed since midnight, January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is CE_FSMONITOR_DIRTY.

We already have a uint64_t in one place in the codebase (getnanotime)
which uses a 64 bit time for nanosecond accuracy, and numerous
filesystems already support nanosecond timestamps (ext4, that new
Apple thingy...).

I don't know if any of the inotify/fsmonitor APIs support that yet,
but it seems inevitable that that'll be added if not, in some
pathological cases we can have a lot of files modified in 1 second, so
using nanosecond accuracy means there'll be a lot less data to
consider in some cases.

It does mean this'll only work until the year ~2500, but that seems
like an acceptable trade-off.


I really don't think nano-second resolution is needed in this case for a few reasons.

The number of files that can change within a given second is limited by the IO throughput of the underlying device. Even assuming a very fast device and very small files and changes, this won't be that many files.

Without this patch, git would have scanned all those files every time. With this patch, git will only scan those files a 2nd time that are modified in the same second that it did the first scan *that came before the first scan started* (the "lots of files modified" section in the 1 second timeline below).

|------------------------- one second ---------------------|
|-lots of files modified - git status - more file modified-|

Yes, some duplicate status checks can be made but its still a significant win in any reasonable scenario. Especially when you consider that it is pretty unusual to do git status/add/commit calls in the middle of making lots of changes to files.

In addition, the backing file system monitor (Watchman) supports number of seconds since the unix epoch (unix time_t style). This means any support of nano seconds by git is academic until someone provides a file system watcher that does support nano second granularity.

Finally, the fsmonitor index extension is versioned so that we can seamlessly upgrade to nano second resolution later if we desire.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]