> > 30 out of 38 member variables are initialized to non-zero value, about 4 > are initialized by the LLDD. Another 4 got written in the I/O return > path, though these 4 are sprinkled in the structure. Even though memset > doesn't write-allocate, there are enough code which will bring the cache > line into the cpu anyway. but.. it's ALREADY in cache after the memset.... That's the entire point of that. You put zeros in the cache without needing to get the overwritten-in-a-few-cycles data from ram, but make sure the data is in cache so that the next uses of it are really cheap. Eg the only cache traffic is writing the data back to ram eventually, which is asynchronous. By avoiding the write allocate altogether you avoid 1) having to wait for the ram and 2) the memory bandwidth needed for it. Both are important, and both are avoided by a memset.. > Since this is arch independent code, I'm also > trying to optimize not just x86-64, but other arch like ia64 which doesn't > have the same memset cache behavior. sounds like an unoptimized cpu to me; alpha and others I'm sure will avoid this as well.. I'd be highly surprised if ppc64 didn't as well. > Our experiments with I/O intensive db workload showed that removing memset > call has a net performance gain for db workload. which architectures? - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html