On August 10, 2019 12:06 PM, Gregory Szorc wrote: > I tracked down a source of Git corrupting repositories to lock file design not > being robust when containers / PID namespaces are present. > > In my case, the corruption stemmed from premature release of the `git gc` > lock in the gc.pid file. But since the lock file code for that file is in gc.c, there > could be other lock files in Git affected by the same design limitation as well. > > The lock design of gc.pid stores the current hostname and PID of the locking > process in the file. If another process comes along and its hostname matches > the stored hostname, it checks to see if the listed PID exists. If the PID is > missing, it assumes the lock is stale and releases the lock. > > A limitation with this approach is it isn't robust in the presence of containers > / PID namespaces. In containers, it is common for the hostname to match > the container host's hostname. Or the hostname will be static string. In > Kubernetes, all containers within a pod share the same hostname. Containers > (almost always) run in separate PID namespaces, so PIDs from outside the > container aren't visible to the container itself. > This means that if e.g. 2 `git gc` processes are running with the same > hostname in separate containers / PID namespaces, Git could prematurely > release the lock file because it thinks the "other" PID is dead and repo > corruption could ensue due to the 2 `git gc` processes racing with each other. > > The on-disk format of lock files obviously needs to be backwards compatible > with older clients. One backwards compatible solution is to append > something to the hostname to disambiguate containers / PID namespaces. > Mercurial appends the current PID namespace identifier to the hostname [1] > and my experience is that this is sufficient to mitigate the issue. It is possible > more robust solutions are achievable. While I like the idea personally, many platforms, including NonStop (TNS/E), do not support pid namespaces. In particular setns(2) may not be implemented. Please make sure that any changes detect this condition properly and omit the use of namespaces. Regards, Randall