From: Neil Horman [mailto:nhorman@xxxxxxxxxxxxx] > On Tue, Jul 01, 2014 at 12:34:26PM +0000, David Laight wrote: > > From: Neil Horman [mailto:nhorman@xxxxxxxxxxxxx] > > > On Mon, Jun 30, 2014 at 01:01:07PM +0000, David Laight wrote: > > > > Even if memset() is inlined (as on x86) using it to zero the union > > > > generates a memory word write of zero, followed by a write of the > > > > smaller field, and then a read of the word. > > > > As well as being a lot of instructions the sequence is unlikely to > > > > be optimised by the store-load forward hardware so will be slow. > > > > > > > > Instead allocate a field of the union that is the same size as the > > > > entire union and write a zero value to it. The compiler will then > > > > generate the required value in a register. > > > > > > > > Signed-off-by: David Laight <david.laight@xxxxxxxxxx> > > > > --- > > > > include/net/sctp/command.h | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h > > > > index 020eb95..5fcd1a7 100644 > > > > --- a/include/net/sctp/command.h > > > > +++ b/include/net/sctp/command.h > > > > @@ -118,6 +118,7 @@ typedef enum { > > > > #define SCTP_MAX_NUM_COMMANDS 18 > > > > > > > > typedef union { > > > > + void *zero_all; /* Set to NULL to clear the entire union */ > > > > __s32 i32; > > > > __u32 u32; > > > > __be32 be32; > > > > @@ -154,7 +155,7 @@ typedef union { > > > > static inline sctp_arg_t \ > > > > SCTP_## name (type arg) \ > > > > { sctp_arg_t retval;\ > > > > - memset(&retval, 0, sizeof(sctp_arg_t));\ > > > > + retval.zero_all = NULL;\ > > > > retval.elt = arg;\ > > > > return retval;\ > > > > } > > > > @@ -191,7 +192,7 @@ static inline sctp_arg_t SCTP_NOFORCE(void) > > > > static inline sctp_arg_t SCTP_NULL(void) > > > > { > > > > sctp_arg_t retval; > > > > - memset(&retval, 0, sizeof(sctp_arg_t)); > > > > + retval.zero_all = NULL; > > > > return retval; > > > > } > > > > > > > > @@ -212,7 +213,6 @@ typedef struct { > > > > */ > > > > static inline int sctp_init_cmd_seq(sctp_cmd_seq_t *seq) > > > > { > > > > - memset(seq, 0, sizeof(sctp_cmd_seq_t)); > > > > return 1; /* We always succeed. */ > > > > } > > > > > > > > -- > > > > 1.8.1.2 > > > > > > > I get the advantage here, but this seems a bit rickety to me, in that it relies > > > on the size of the union never being more than sizeof(void *) bytes long. I > > > understand thats not likely to happen, but its not the sort of thing we're > > > likely to notice if it does. > > > > > > That said, I'd almost make the argument that we don't need to do this zeroing at > > > all. A while back I fixed a few outlier cases where we accidentally read a > > > union member which we didn't set, so now we shouldn't have any type punning > > > going on here. It should be sufficient to just set the field you want, and > > > leave it at that. > > > > On a 'big endian' system reading a wrong-sized member won't give the correct > > value anyway. > > > > One benefit of zeroing the union is that diagnostic traces (added for debug) > > will give sane values regardless of the member printed. > > OTOH no such traces actually exist. > > > > With the patch, the generated code on amd64 is probably almost exactly > > the same as that without the zeroing. The same is probable true for most LE > > systems. > > I haven't looked at how gcc handles the union on BE systems - but it is > > likely to be a lot better if it isn't zeroed at all (or if all the fields > > are the size of the union). > > > > David > > > Is it worth considering making a slab cache out of this? We could amortize the > cost of the zeroing then by doing it per slab, and then we wouldn't have to > worry about making sure we always zero the largest element. Probably not, it us currently an on-stack data area. The cost of zeroing the entire structure is probably measureable however it is done. Remember, most of the time only a few entries are used - but the code paths are used for every rx chunk and every tx request (and probably elsewhere). I actually thought about changing the 'array of struct' into a 'struct of array' in order to reduce the on-stack memory footprint. (The struct has two fields, one is 8 bits, the other a union whose largest field is a pointer.) The real optimisation is to realise that it is pointless to defer (most of) the actions - it is only likely to lead to actions being performed on stale pointers. However simple inlining changes the order - which might have its own bugs. David -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html