On 3/11/07, Junio C Hamano <junkio@xxxxxxx> wrote:
"Jon Smirl" <jonsmirl@xxxxxxxxx> writes: > Reading the other thread on tracking temporary changes made me think > of using inotify with git. The basic idea would be to a daemon running > that uses inotify to listen for changes in the working tree. As these > changes happen they get committed to a tracking tree. I think it is an interesting idea, but can be used with any SCM not just git ;-).
As for the part about 'git grep' Shawn and I have been talking off and on about experimenting with an inverted index for a packfile format. The basic idea is that you tokenize the input and turn a source file into a list of tokens. You diff with the list of tokens like you would normally do with text. There is a universal dictionary for tokens, a token's id is it's position in the dictionary. Tokenized text is one of the most compact compression schemes known. It can get even more compact by tokenizing common phrases and using variable length token ids. Compression schemes like this are used in web search engines. Of course you keep a check in place for input that doesn't tokenize (binary) and fallback to gzip. To build 'git grep' you make a bitmap index for each token in the dictionary and put a one in it if the file has the token. Gzip these indexes and then there are algorithms for doing and/or operations on the zipped indexes without expanding them. grep is almost instant over gigabytes of text if indexes like this are available. Keeping everything up to date on a dual core system is pretty much free since that second core is rarely doing anything while you are editing. -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html