That was a very rude reply. :( Please refrain from interacting with me in the ML in the future. I'l do accordingly. Thanks! vmg On Thu, Jun 27, 2013 at 3:11 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > On Tue, Jun 25, 2013 at 4:08 PM, Vicent Martí <tanoku@xxxxxxxxx> wrote: >> On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> What case are you talking about? >>> >>> The n-th object must be one of these four types and can never be of >>> more than one type at the same time, so a natural expectation from >>> the reader is "If you OR them together, you will get the same set". >>> If you say "If you XOR them", that forces the reader to wonder when >>> these bitmaps ever can overlap at the same bit position. >> >> I guess this is just wording. I don't particularly care about the >> distinction, but I'll change it to OR. > > Hmm, OK. If you think XOR and OR are the same operation, I also have a > bridge to sell you. Its in Brooklyn. Its a great value. > > The correct operation is OR. Not XOR. OR. Drop the X. > >> It cannot be mmapped not particularly because of endianness issues, >> but because the original format is not indexed and requires a full >> parse of the whole index before it can be accessed programatically. >> The wrong endianness just increases the parse time. > > Wrong endianness has nothing to do with the parse time. Modern CPUs > can flip a word around very quickly. In JGit we chose to parse the > file at load time because its simpler than having an additional index > segment, and we do what you did which is to toss the object SHA-1s > into a hashtable for fast lookup. By the time we look for the SHA-1s > and toss them into a hashtable we can stride through the file and find > the bitmap regions. Simple. > > In other words, the least complex solution possible that still > provides good performance. I'd say we have pretty good performance. > >>>> and I'm going to try to make it run fast enough in that >>>> encoding. >>> >>> Hmph. Is it an option to start from what JGit does, so that people >>> can use both JGit and your code on the same repository? > > I'm afraid I agree here with Junio. The JGit format is already > shipping in JGit 3.0, Gerrit Code Review 2.6, and in heavy production > use for almost a year on android.googlesource.com, and Google's own > internal Git trees. > > I would prefer to see a series adding bitmap support to C Git start > with the existing format, make it run, taking advantage of the > optimizations JGit uses (many of which you ignored and tried to "fix" > in other ways), and then look at improving the file format itself if > load time is still the largest low hanging fruit in upload-pack. I'm > guessing its not. You futzed around with the object table, but JGit > sped itself up considerably by simply not using the object table when > the bitmap is used. I think there are several such optimizations you > missed in your rush to redefine the file format. > >>> And then if >>> you do not succeed, after trying to optimize in-core processing >>> using that on-disk format to make it fast enough, start thinking >>> about tweaking the on-disk format? >> >> I'm afraid this is not an option. I have an old patchset that >> implements JGit v1 bitmap loading (and in fact that's how I initially >> developed these series -- by loading the bitmaps from JGit for >> debugging), but I discarded it because it simply doesn't pan out in >> production. ~3 seconds time to spawn `upload-pack` is not an option >> for us. I did not develop a tweaked on-disk format out of boredom. > > I think your code or experiments are bogus. Even on our systems with > JGit a cold start for the Linux kernel doesn't take 3s. And this is > JGit where Java is slow because "Jesus it has a lot of factories", and > without mmap'ing the file into the server's address space. Hell the > file has to come over the network from a remote disk array. > >> I could dig up the patch if you're particularly interested in >> backwards compatibility, but since it was several times slower than >> the current iteration, I have no interest (time, actually) to maintain >> it, brush it up, and so on. I have already offered myself to port the >> v2 format to JGit as soon as it's settled. It sounds like a better >> investment of all our times. > > Actually, I think the format you propose here is inferior to the JGit > format. In particular the idx-ordering means the EWAH code is useless. > You might as well not use the EWAH format and just store 2.6M bits per > commit. The idx-ordering also makes *much* harder to emit a pack file > a reasonable order for the client. Colby and I tried idx-ordering and > discarded it when it didn't perform as well as the pack-ordering that > JGit uses. > >> Following up on Shawn's comments, I removed the little-endian support >> from the on-disk format and implemented lazy loading of the bitmaps to >> make up for it. The result is decent (slowed down from 250ms to 300ms) >> and it lets us keep the whole format as NWO on disk. I think it's a >> good tradeback. > > The maintenance burden of two endian formats in a single file is too > high to justify. I'm glad to see you saw that. > >> As it stands right now, the only two changes from v1 of the on-disk format are: >> >> - There is an index at the end. This is a good idea. > > I don't think the index is necessary if you plan to build a hashtable > at runtime anyway. If you mmap the file you can quickly skip over a > bitmap and find the next SHA-1 using this thing called "pointer > arithmetic". I am not sure if you are familiar with the term, perhaps > you could search the web for it. > >> - The bitmaps are sorted in packfile-index order, not in packfile >> order. This is a good idea. > > As Colby and I have repeatedly tried to explain, this is not a good idea. > >> German kisses, > > Strawberry and now German kisses? What's next, Mango kisses? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html