From: David Howells > Sent: Friday, August 18, 2023 4:20 PM > > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > Although I'm not sure the bit-fields really help. > > > There are 8 bytes at the start of the structure, might as well > > > use them :-) > > > > Actuallyç I wrote the patch that way because it seems to improve code > > generation. > > > > The bitfields are generally all set together as just plain one-time > > constants at initialization time, and gcc sees that it's a full byte > > write. And the reason 'data_source' is not a bitfield is that it's not > > a constant at iov_iter init time (it's an argument to all the init > > functions), so having that one as a separate byte at init time is good > > for code generation when you don't need to mask bits or anything like > > that. > > > > And once initialized, having things be dense and doing all the > > compares with a bitwise 'and' instead of doing them as some value > > compare again tends to generate good code. > > Actually... I said that switch(enum) seemed to generate suboptimal code... > However, if the enum is renumbered such that the constants are in the same > order as in the switch() it generates better code. Hmmm.. the order of the switch labels really shouldn't matter. The advantage of the if-chain is that you can optimise for the most common case. > So we want this order: > > enum iter_type { > ITER_UBUF, > ITER_IOVEC, > ITER_BVEC, > ITER_KVEC, > ITER_XARRAY, > ITER_DISCARD, > }; Will gcc actually code this version without pessimising it? if (likely(type <= ITER_IOVEC) { if (likely(type != ITER_IOVEC)) iterate_ubuf(); else iterate_iovec(); } else if (likely(type) <= ITER_KVEC)) { if (type == ITER_KVEC) iterate_kvec(); else iterate_bvec(); } else if (type == ITER_XARRAY) { iterate_xarrar() } else { discard; } But I bet you can't stop it replicating the compares. (especially with the likely(). That has two mis-predicted (are they ever right!) branches in the common user-copy versions and three in the common kernel ones. In some architectures you might get the default 'fall through' to the UBUF code if the branches aren't predictable. But I believe current x86 cpu never do static prediction. So you always lose :-) ... > static inline bool user_backed_iter(const struct iov_iter *i) > { > return iter_is_ubuf(i) || iter_is_iovec(i); > } > > which gcc just changes into something like a "CMP $1" and a "JA". That makes sense... > Comparing Linus's bit patch (+ is better) to renumbering the switch (- is > better): > .... > iov_iter_init inc 0x27 -> 0x31 +0xa Are you hitting the gcc bug that loads the constant from memory? > I think there may be more savings to be made if I go and convert more of the > functions to using switch(). Size isn't everything, the code needs to be optimised for the hot paths. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)