On Thu, 2016-03-17 at 15:43 +0100, Johannes Schindelin wrote: > Hi Duy, > > On Thu, 17 Mar 2016, Duy Nguyen wrote: > > > On Thu, Mar 17, 2016 at 1:27 AM, Johannes Schindelin > > <Johannes.Schindelin@xxxxxx> wrote: > > > I am much more concerned about concurrent accesses and the > > > communication > > > between the Git processes and the index-helper. Writing to the > > > .pid file > > > sounds very fragile to me, in particular when multiple processes > > > can poke > > > the index-helper in succession and some readers are unaware that > > > the index > > > is being refreshed. > > > > It's not that bad. > > Well, the way I read the code it is possible that: > > 1. Git process 1 starts, reading the index > 2. Git process 2 starts, poking the index-helper > 3. The index-helper updates the .pid file (why not set a bit in the > shared > memory?) with a prefix "W" > 4. Git process 2 reads the .pid file and waits for the "W" to go away > (what if index-helper is not fast enough to write the "W"?) > 5. Git process 1 access the index, happily oblivious that it is being > updated and the data is in an inconsistent state That's not quite how I understand it. It's more like MVCC. Writes to the index go to a new index file. Index files are identified by their SHA. Reads from the index go into a new shm, identified by SHA. The "W" is set only once -- it just means "this index helper knows how to talk to watchman". It's a compile-time option. (I'm going to change this anyway when I switch to named pipes). The watchman data is shared independently; if it's not ready in time (whatever that means -- it's 1s in the current code), then read-cache should fall back to brute-force checking every file. > > We should have protection in place to deal with this and fall back > > to > > reading directly from file when things get suspicious. > > I really want to prevent that. I know of use cases where the index > weighs > 300MB, and falling back to reading it directly *really* hurts. > > But I agree that sending UNIX signals (or PostMessage) is not > > really > > good communication. > > Yeah, I really would like two-way communication instead. Named pipes? > They'd have the advantage that you could use the full path to the > index as > identifier. > > The way I read the current code, we would actually create a different > shared memory every time the index changes because its checksum is > part of > the shared memory's "path"... > > Ciao, > Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html