On Wed, Mar 9, 2022 at 4:21 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Neeraj Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > +/* > > + * These values are used to help identify parts of a repository to fsync. > > + * FSYNC_COMPONENT_NONE identifies data that will not be a persistent part of the > > + * repository and so shouldn't be fsynced. > > + */ > > +enum fsync_component { > > + FSYNC_COMPONENT_NONE, > > + FSYNC_COMPONENT_LOOSE_OBJECT = 1 << 0, > > + FSYNC_COMPONENT_PACK = 1 << 1, > > + FSYNC_COMPONENT_PACK_METADATA = 1 << 2, > > + FSYNC_COMPONENT_COMMIT_GRAPH = 1 << 3, > > +}; > > OK, so the idea is that Patrick's "we need to fsync refs" will be > done by adding a new component to this list, and sprinkling a call > to fsync_component_or_die() in the code of ref-files backend? > Yes. Patrick will need to add fsync_component_or_die wherever his patch series has already added fsync_or_die. If he follows Ævar's suggestion of treating remote refs differently from local refs, he might want to define multiple components. > I am wondering if fsync_or_die() interface is abstracted well > enough, or we need things like "the fd is inside this directory; in > addition to doing the fsync of the fd, please sync the parent > directory as well" support before we start adding more components > (if there is such a need, perhaps it comes before this step). > I think syncing the parent directory is a separate fsyncMethod that would require changes across the codebase to obtain an appropriate directory fd. I'd prefer to treat that as a separable concern. > > +#define FSYNC_COMPONENTS_DEFAULT (FSYNC_COMPONENT_PACK | \ > > + FSYNC_COMPONENT_PACK_METADATA | \ > > + FSYNC_COMPONENT_COMMIT_GRAPH) > > IOW, everything other than loose object, which already has a > separate core.fsyncObjectFiles knob to loosen. Everything else we > currently sync unconditionally and the default keeps that > arrangement? > Yes, trying to keep default behavior identical on non-Windows platforms. Windows will be expected to adopt batch mode, and have loose objects in this set. > > +static inline void fsync_component_or_die(enum fsync_component component, int fd, const char *msg) > > +{ > > + if (fsync_components & component) > > + fsync_or_die(fd, msg); > > +} > > Do we have a compelling reason to have this as a static inline > function? We are talking about concluding an I/O operation and > I doubt there is a good performance argument for it. > Yeah, this is meant to optimize the case where the component isn't being fsynced. I'll move this function to write-or-die.c below fsync_or_die. > > +static const struct fsync_component_entry { > > + const char *name; > > + enum fsync_component component_bits; > > +} fsync_component_table[] = { > > thing[] is an array of "thing" (and thing[4] is the "fourth" such > thing), but this is not an array of a table (it is a name-to-bit > mapping). > > I wonder if this array works without "_table" suffix in its name. This is modeled after whitespace_rule_names. What if I change this to the following? static const struct fsync_component_name { ... } fsync_component_names[] > > > + { "loose-object", FSYNC_COMPONENT_LOOSE_OBJECT }, > > + { "pack", FSYNC_COMPONENT_PACK }, > > + { "pack-metadata", FSYNC_COMPONENT_PACK_METADATA }, > > + { "commit-graph", FSYNC_COMPONENT_COMMIT_GRAPH }, > > +}; > > + > > +static enum fsync_component parse_fsync_components(const char *var, const char *string) > > +{ > > + enum fsync_component output = 0; > > + > > + if (!strcmp(string, "none")) > > + return FSYNC_COMPONENT_NONE; > > + > > + while (string) { > > + int i; > > + size_t len; > > + const char *ep; > > + int negated = 0; > > + int found = 0; > > + > > + string = string + strspn(string, ", \t\n\r"); > > + ep = strchrnul(string, ','); > > + len = ep - string; > > + > > + if (*string == '-') { > > + negated = 1; > > + string++; > > + len--; > > + if (!len) > > + warning(_("invalid value for variable %s"), var); > > + } > > + > > + if (!len) > > + break; > > + > > + for (i = 0; i < ARRAY_SIZE(fsync_component_table); ++i) { > > + const struct fsync_component_entry *entry = &fsync_component_table[i]; > > + > > + if (strncmp(entry->name, string, len)) > > + continue; > > + > > + found = 1; > > + if (negated) > > + output &= ~entry->component_bits; > > + else > > + output |= entry->component_bits; > > + } > > + > > + if (!found) { > > + char *component = xstrndup(string, len); > > + warning(_("ignoring unknown core.fsync component '%s'"), component); > > + free(component); > > + } > > + > > + string = ep; > > + } > > + > > + return output; > > +} > > Hmph. I would have expected, with built-in default of > pack,pack-metadata,commit-graph, > At the conclusion of this series, I defined 'default' as an aggregate option that includes the platform default. I'd prefer not to have any statefulness of the core.fsync setting so that there is less confusion about the final fsync configuration. My colleagues had a fair amount of confusion internally when testing Git performance internally with regards to the core.fsyncObjectFiles setting. Inline this is how your configs would be written: > - "none,pack" would choose only "pack" by first clearing the > built-in default (or whatever was set in configuration files that > are lower precedence than what we are reading) and then OR'ing > the "pack" bit in. > "pack" would choose only "pack" > - "-pack" would choose "pack-metadata,commit-graph" by first > starting from the built-in default and then CLR'ing the "pack" > bit out. If there were already changes made by the lower > precedence configuration files like /etc/gitconfig, the result > might be different and the only definite thing we can say is that > the pack bit is cleared. > "default,-pack" would be the platform default, but not packfiles. > - "loose-object" would choose all of the bits by first starting > from the built-in default and then OR'ing the "loose-object" bit > in. > "default,loose-object" would add loose objects to the platform config. > Otherwise, parsing "none" is more or less pointless, as the above > parser always start from 0 and OR's in or CLR's out the named bit. > Whoever writes "none" can just write an empty string, no? I think the empty string would be disallowed since "core.fsync=" would be entirely missing a value. But on testing this doesn't seem to be the case. I'll change this to be more strict in that the user has to pass an explicit value, such as 'none'. > > I wonder you'd rather want to do it this way? > > parse_fsync_components(var, value, current) { > enum fsync_component positive = 0, negative = 0; > > while (string) { > int negated = 0; > enum fsync_component bits; > > parse out a single component into <negated, bits>; > > if (bits == 0) { /* "none" given */ > current = 0; > } else if (negated) { > negative |= bits; > } else { > positive |= bits; > } > advance <string> pointer; > } > > return (current | positive) & ~negative; > } > > And then ... > > > + if (!strcmp(var, "core.fsync")) { > > + if (!value) > > + return config_error_nonbool(var); > > + fsync_components = parse_fsync_components(var, value); > > + return 0; > > + } > > + > > ... this part would pass the current value of fsync_components as > the third parameter to the parse_fsync_components(). The variable > would be initialized to the FSYNC_COMPONENTS_DEFAULT we saw earlier. > I'm afraid that this method would lead to statefulness between different levels configuring core.fsync. I'd prefer that the user could know what will happen by just inspecting the value returned by `git config core.fsync`. > > > @@ -1613,7 +1684,7 @@ static int git_default_core_config(const char *var, const char *value, void *cb) > > } > > > > if (!strcmp(var, "core.fsyncobjectfiles")) { > > - fsync_object_files = git_config_bool(var, value); > > + warning(_("core.fsyncobjectfiles is deprecated; use core.fsync instead")); > > This is not deprecating but removing the support, which I am not > sure is a sensible thing to do. Rather we should pretend that > core.fsync = "loose-object" (or "-loose-object") were found in the > configuration, shouldn't we? > The problem I anticipate is that figuring out the final configuration becomes pretty complex in the face of conflicting configurations of fsyncObjectFiles and core.fsync. The user won't know what will happen without reading the Git code or doing performance experiments. I thought we can avoid all of this complexity by just having a simple warning that pushes users toward the new configuration value. Aside from seeing a warning, a user's actual usage of Git functionality shouldn't be affected. Alternatively, what if we just silently ignore the old core.fsyncObjectFiles setting? If neither of these is an option, I'll put back some support for the old setting. Thanks, Neeraj