On Fri, Feb 11, 2022 at 03:15:20PM -0800, Junio C Hamano wrote: > Neeraj Singh <nksingh85@xxxxxxxxx> writes: > > > In practice, almost all users have core.fsyncObjectFiles set to the > > platform default, which is 'false' everywhere besides Windows. So at > > minimum, we have to take default to mean that we maintain behavior no > > weaker than the current version of Git, otherwise users will start > > losing their data. > > Would they? If they think platform default is performant and safe > enough for their use, as long as our adjustment is out outrageously > more dangerous or less performant, I do not think "no weaker than" > is a strict requirement. If we were overly conservative in some > areas than the "platform default", making it less conservative in > those areas to match the looseness of other areas should be OK and > vice versa. > > > One path to get to your suggestion from the current patch series would > > be to remove the component-specific options and only provide aggregate > > options. Alternatively, we could just not document the > > component-specific options and leave them available to be people who > > read source code. So if I rename the aggregate options in terms of > > 'levels of durability', and only document those, would that be > > acceptable? > > In any case, if others who reviewed the series in the past are happy > with the "two knobs" approach and are willing to jump in to help new > users who will be confused with one knob too many, I actually am OK > with the series that I called "overly complex". Let me let them > weigh in before I can answer that question. > > Thanks. I wonder whether it makes sense to distinguish client- and server-side requirements. While we probably want to "do the right thing" on the client-side by default so that we don't have to teach users that "We may use data in some cases on some systems, but not in other cases on other systems by default." On the server-side folks who implement the Git hosting are typically a lot more knowledgeable with regards to how Git behaves, and it can be expected of them to dig a lot deeper than we can and should reasonably expect from a user. One point I really care about and that I think is extremely important in the context of both client and server is data consistency. While it is bad to lose data, the reality is that it can simply happen when systems hit exceptional cases like a hard-reset. But what we really must not let happen is that as a result, a repository gets corrupted because of this hard reset. In the context of client-side repositories a user will be left to wonder how to fix the repository, while on the server-side we need to go in and somehow repair the repository fast or otherwise users will come to us and complain their repository stopped working. Our current defaults can end up with repository corruption though. We don't sync object files before renaming them into place by default, and furthermore we don't even give a knob to fsync loose references before renaming them. On gitlab.com we enable "core.fsyncObjectFiles" and thus to the best of my knowledge never hit a corrupted ODB until now. But what we do regularly hit is corrupted references because there is no knob to fsync them. Furthermore, I really think that the advice in git-config(1) with regards to loose syncing loose objects that "This is a total waste of time and effort" is wrong. Filesystem developers have repeatedly stated that for proper atomicity guarantees, a file must be synced to disk before renaming it into place [1][2][3]. So from the perspective of the people who write Linux-based filesystems our application is "broken" right now, even though there are mechanisms in place which try to work around our brokenness by using heuristics to detect what we're doing. To summarize my take: while the degree of durability may be something that's up for discussions, I think that the current defaults for atomicity are bad for users because they can and do lead to repository corruption. Patrick [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)
Attachment:
signature.asc
Description: PGP signature