"Neeraj Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > Batched fsync will fit into bulk-checkin by taking advantage of the > plug/unplug functionality to determine the appropriate time to fsync > and make newly-added objects available in the primary object database. > > * Rename 'state' variable to 'bulk_checkin_state', since we will later > be adding 'bulk_fsync_objdir'. This also makes the variable easier to > find in the debugger, since the name is more unique. > > * Move the 'plugged' data member of 'bulk_checkin_state' into a separate > static variable. Doing this avoids resetting the variable in > finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we > seem to unintentionally disable the plugging functionality the first > time a new packfile must be created due to packfile size limits. While > disabling the plugging state only results in suboptimal behavior for > the current code, it would be fatal for the bulk-fsync functionality > later in this patch series. Paraphrasing to make sure I understand your reasoning here... In the "plug and then perform as many changes to the repository and finally unplug" flow, before or after this series, the "perform" step in the middle is unaware of which "bulk_checkin_state" instance is being used to keep track of what is done to optimize by deferring some operations until the "unplug" time. So bulk_checkin_state is not there to allow us to create multiple instances of it, pass them around to different sequences of "plug, perform, unplug". Each of its members is inherently a singleton, so in the extreme, we could turn these members into separate file-scope global variables if we wanted to. The "plugged" bit happens to be the only one getting ejected by this patch, because it is inconvenient to "clear" other members otherwise. Is that what is going on? If it is, I am mildly opposed to the flow of thought, from at least two reasons. It makes it hard for the next developer to decide if the new members they are adding should be in or out of the struct. More importantly, I think the call of finish_bulk_checkin() we make in deflate_to_pack() you found (and there may possibly other places that we do so; I didn't check) may not appear to be a bug in the original context, but it already is a bug. And when we change the semantics of plug-unplug to be more "transaction-like", it becomes a more serious bug, as you said. There is NO reason to end the ongoing transaction there inside the while() loop that tries to limit the size of the packfile being used. We may want to flush the "packfile part", which may have been almost synonymous to the entirety of bulk_checkin_state, but as you found out, the "plugged" bit is *outside* the "packfile part", and that makes it a bug to call finish_bulk_checkin() from there. We should add a new function, flush_bulk_checking_packfile(), to flush only the packfile part of the bulk_checkin_state without affecting other things---the "plugged" bit is the only one in the current code before this series, but it does not have to stay to be so. When you start plugging the loose ref transactions, you may find it handy (this is me handwaving) to have a list of refs that you may have to do something at "unplug" time kept in the struct, and you do not want deflate_to_pack() affecting the ongoing "plugged" ref operations by calling finish_bulk_checkin() and reinitializing that list, for example. And then we should examine existing calls to finish_bulk_checkin() and replace the ones that should not be finishing, i.e. the ones that wanted "flush" but called "finish".