RE: Optimize scsi_cmnd initialization and bug fix

Arjan van de Ven <arjan@xxxxxxxxxxxxx> · Tue, 29 Nov 2005 08:37:33 +0100

> 
> 30 out of 38 member variables are initialized to non-zero value, about 4
> are initialized by the LLDD.  Another 4 got written in the I/O return
> path, though these 4 are sprinkled in the structure.  Even though memset
> doesn't write-allocate, there are enough code which will bring the cache
> line into the cpu anyway.  

but.. it's ALREADY in cache after the memset.... That's the entire point
of that. You put zeros in the cache without needing to get the
overwritten-in-a-few-cycles data from ram, but make sure the data is in
cache so that the next uses of it are really cheap. Eg the only cache
traffic is writing the data back to ram eventually, which is
asynchronous. By avoiding the write allocate altogether you avoid 1)
having to wait for the ram and 2) the memory bandwidth needed for it.
Both are important, and both are avoided by a memset..

> Since this is arch independent code, I'm also
> trying to optimize not just x86-64, but other arch like ia64 which doesn't
> have the same memset cache behavior.

sounds like an unoptimized cpu to me; alpha and others I'm sure will
avoid this as well.. I'd be highly surprised if ppc64 didn't as well. 

> Our experiments with I/O intensive db workload showed that removing memset
> call has a net performance gain for db workload.

which architectures?

-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html