On Sat, May 23, 2015 at 08:19:03AM +0700, Duy Nguyen wrote: > On Sat, May 23, 2015 at 6:51 AM, Jeff King <peff@xxxxxxxx> wrote: > > The other problem is that I'm not sure stat data is enough to notice > > when a directory changes. Certainly the mtime should change, but if you > > have only one-second resolution on your mtimes, we can be fooled. > > mtime may or may not change. I based my untracked-cache series > entirely on this directory mtime and did some research about it. For > UNIXy filesystems on UNIXy OSes, mtime should work as you expect. FAT > on Windows does not (but FAT on Linux does..). NTFS works fine > according to some MS document. No idea about CIFS. But people often > just do open operation of a time and this racy is not an issue. It is > only an issue on the busy server side, and you can make sure you run > on the right fs+OS. Even on Linux+ext4, I think there is some raciness. For instance, the program at the end of this mail will loop forever, running "stat" on an open directory fd, then enumerating the entries in the directory. If we ever get the same stat data as the last iteration but different contents, then it complains. If you run it alongside something simple, like touching 10,000 files in the directory, it fails pretty quickly. This is because we have no atomicity guarantees with directories. We can stat() them, and then while we are reading them, somebody else can be changing the entries. Whether we see the "before" or "after" state depends on the timing. I'm not 100% sure this translates into real-world problems for packfiles. If you notice that an object is missing and re-scan the pack directory (stat-ing it during the re-scan), then the change that made the object go missing must have happened _before_ the stat, and would presumably be reflected in it (modulo mtime resolution issues). But I haven't thought that hard about it yet, and I have a nagging feeling that there will be problem cases. It might be that you could get an "atomic" read of the directory by doing a stat before and after and making sure they're the same (and if not, repeating the readdir() calls). But I think that suffers from the same mtime-resolution problems. Linux+ext4 _does_ have nanosecond mtimes, which perhaps is enough to assume that any change will be reflected. I dunno. I guess the most interesting test would be something closer to the real world: one process repeatedly making sure the object pointed to by "master" exists, and another one constantly rewriting "master" and then repacking the object. -- >8 -- #include "cache.h" #include "string-list.h" static void get_data(const char *path, struct string_list *entries, struct stat_validity *validity) { DIR *dir = opendir(path); struct dirent *de; stat_validity_update(validity, dirfd(dir)); while ((de = readdir(dir))) string_list_insert(entries, de->d_name); closedir(dir); } static int list_eq(const struct string_list *a, const struct string_list *b) { int i; if (a->nr != b->nr) return 0; for (i = 0; i < a->nr; i++) if (strcmp(a->items[i].string, b->items[i].string)) return 0; return 1; } static void monitor(const char *path) { struct string_list last_entries = STRING_LIST_INIT_DUP; struct stat_validity last_validity; get_data(path, &last_entries, &last_validity); while (1) { struct string_list cur_entries = STRING_LIST_INIT_DUP; struct stat_validity cur_validity; get_data(path, &cur_entries, &cur_validity); if (!memcmp(&last_validity, &cur_validity, sizeof(last_validity)) && !list_eq(&cur_entries, &last_entries)) die("mismatch"); string_list_clear(&last_entries, 0); memcpy(&last_entries, &cur_entries, sizeof(last_entries)); stat_validity_clear(&last_validity); memcpy(&last_validity, &cur_validity, sizeof(last_validity)); } } int main(int argc, const char **argv) { monitor(*++argv); return 0; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html