Re: On data structures and parallelism

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 17 May 2009 13:35:44 -0700 (PDT)

On Sun, 17 May 2009, david@xxxxxxx wrote:
> 
> do things change with SSDs? I've heard that even (especially??) with the Intel
> SSDs you want to have several operations going in paralllel to get the best
> out of them.

There's a slight, but noticeable, improvement.

This is: "echo 3 > /proc/sys/vm/drop_caches; time git diff" run in a loop. 

With 'core.preloadindex = true':

	real	0m1.138s
	real	0m1.116s
	real	0m1.132s
	real	0m1.120s
	real	0m1.106s
	real	0m1.132s

and with it set to 'false':

	real	0m1.256s
	real	0m1.258s
	real	0m1.242s
	real	0m1.240s
	real	0m1.244s
	real	0m1.242s

so it's about a 10% improvement. Which is pretty good, considering 
that

 (a) those disks are fast enough that even for that totally cache-cold 
     case, I get about 35% CPU utilization for the single-threaded case.

     And that's despite this being a 3.2GHz Nehalem box, so 35% CPU is 
     really quite remarkably good. Om my (much slower) laptop with a 
     1.2GHz Core 2, I get 2-3% CPU-time (and the whole operation takes 20 
     seconds).

 (b) Not all the IO ends up being parallelized, since there is a 
     per-directory mutex that means that even though we start 20 threads, 
     it probably gets a much smaller amount of real parallelism due to 
     locking.

in general, the IO parallelization obviously helps most when the IO is 
slow _and_ overlaps perfectly. Perfect overlap doesn't end up happening 
due to the per-directory lookup semaphore (think of it like a bank 
conflict in trying to parallelize memory accesses), but with a slow NFS 
connection you should get reasonably close to that optimal situation.

But with a single spindle, and rotating media, there really is sadly very 
little room for optimization. I suspect a SATA with TCQ disk might be able 
to do _somewhat_ better than my old PATA-only laptop (discounting the fact 
that my PATA laptop harddisk is extra slow due to being just 4200rpm: any 
desktop disk will be much faster), but I doubt the index preloading is 
really all that noticeable.

In fact, I just tested on another machine, and saw no difference 
what-so-ever. If anything, it was slightly slower. I suspect TCQ is a 
bigger win with writes.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html