Arjan van de Ven wrote on Thursday, November 24, 2005 12:06 AM > On Wed, 2005-11-23 at 14:50 -0800, Chen, Kenneth W wrote: > > struct scsi_cmnd is a fairly large data structure and is used > > to construct scsi command. On x86-64 arch, it is 448 bytes. > > > > In scsi_get_command(), it is a bit too paranoid in zeroing the > > data structure. Since most of the member variables will be > > initialized in various stage of I/O submission, i.e., in > > scsi_prep_fn, scsi_init_io, sd_init_command, scsi_init_cmd_errh, > > etc. So instead of blindly zeroing the whole structure, initialize > > to some other value later on. All it needs in scsi_get_command > > is to zero out a few member variables that aren't initialized in > > the I/O path. > > actually I question this optimisation. memset uses the rep stosl code > sequence, which mean that the cpu can avoid write-allocate on the > cachelines in question, and just plain zero them in cache. If you > initialize the parts one by one, the cpu will need to do write-allocate > on the cachelines, and thus has double the memory bandwidth needed than > the existing case. > > (not sure if all cpus are smart enough to avoid write allocate for rep > stosl, but most of the newer ones are) 30 out of 38 member variables are initialized to non-zero value, about 4 are initialized by the LLDD. Another 4 got written in the I/O return path, though these 4 are sprinkled in the structure. Even though memset doesn't write-allocate, there are enough code which will bring the cache line into the cpu anyway. Since this is arch independent code, I'm also trying to optimize not just x86-64, but other arch like ia64 which doesn't have the same memset cache behavior. Our experiments with I/O intensive db workload showed that removing memset call has a net performance gain for db workload. - Ken - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html