On 2018-11-05 22:48, Bart Van Assche wrote: > On Mon, 2018-11-05 at 13:13 -0800, Andrew Morton wrote: >> On Mon, 5 Nov 2018 12:40:00 -0800 Bart Van Assche <bvanassche@xxxxxxx> wrote: >> >>> This patch suppresses the following sparse warning: >>> >>> ./include/linux/slab.h:332:43: warning: dubious: x & !y >>> >>> ... >>> >>> --- a/include/linux/slab.h >>> +++ b/include/linux/slab.h >>> @@ -329,7 +329,7 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) >>> * If an allocation is both __GFP_DMA and __GFP_RECLAIMABLE, return >>> * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE >>> */ >>> - return type_dma + (is_reclaimable & !is_dma) * KMALLOC_RECLAIM; >>> + return type_dma + is_reclaimable * !is_dma * KMALLOC_RECLAIM; >>> } >>> >>> /* >> >> I suppose so. >> >> That function seems too clever for its own good :(. I wonder if these >> branch-avoiding tricks are really worthwhile. > > From what I have seen in gcc disassembly it seems to me like gcc uses the > cmov instruction to implement e.g. the ternary operator (?:). So I think none > of the cleverness in kmalloc_type() is really necessary to avoid conditional > branches. I think this function would become much more readable when using a > switch statement or when rewriting it as follows (untested): > > static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) > { > - int is_dma = 0; > - int type_dma = 0; > - int is_reclaimable; > - > -#ifdef CONFIG_ZONE_DMA > - is_dma = !!(flags & __GFP_DMA); > - type_dma = is_dma * KMALLOC_DMA; > -#endif > - > - is_reclaimable = !!(flags & __GFP_RECLAIMABLE); > - > /* > * If an allocation is both __GFP_DMA and __GFP_RECLAIMABLE, return > * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE > */ > - return type_dma + (is_reclaimable & !is_dma) * KMALLOC_RECLAIM; > + static const enum kmalloc_cache_type flags_to_type[2][2] = { > + { 0, KMALLOC_RECLAIM }, > + { KMALLOC_DMA, KMALLOC_DMA }, > + }; > +#ifdef CONFIG_ZONE_DMA > + bool is_dma = !!(flags & __GFP_DMA); > +#endif > + bool is_reclaimable = !!(flags & __GFP_RECLAIMABLE); > + > + return flags_to_type[is_dma][is_reclaimable]; > } > Won't that pessimize the cases where gfp is a constant to actually do the table lookup, and add 16 bytes to every translation unit? Another option is to add a fake KMALLOC_DMA_RECLAIM so the kmalloc_caches[] array has size 4, then assign the same dma kmalloc_cache pointer to [2][i] and [3][i] (so that costs perhaps a dozen pointers in .data), and then just compute kmalloc_type() as ((flags & __GFP_RECLAIMABLE) >> someshift) | ((flags & __GFP_DMA) >> someothershift). Perhaps one could even shuffle the GFP flags so the two shifts are the same. Rasmus