Re: [PATCHv5 04/12] x86/boot: Add infrastructure required for unaccepted memory support

Wander Lairson Costa <wander@xxxxxxxxxx> · Fri, 29 Apr 2022 07:58:14 -0300

On Mon, Apr 25, 2022 at 06:39:26AM +0300, Kirill A. Shutemov wrote:

[snip]

>  
> +static __always_inline void __set_bit(long nr, volatile unsigned long *addr)

Can't we update the existing set_bit function?

> +{
> +	asm volatile(__ASM_SIZE(bts) " %1,%0" : : "m" (*(volatile long *) addr),

Why do we need the cast here?

> +		     "Ir" (nr) : "memory");

Shouldn't we add "cc" to the clobber list?

> +}
> +
> +static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +{
> +	asm volatile(__ASM_SIZE(btr) " %1,%0" : : "m" (*(volatile long *) addr),
> +		     "Ir" (nr) : "memory");
> +}

Same comments of __set_bit apply here (except there is no clear_bit function)

[snip]

> +
> +static __always_inline unsigned long swab(const unsigned long y)
> +{
> +#if __BITS_PER_LONG == 64
> +	return __builtin_bswap32(y);
> +#else /* __BITS_PER_LONG == 32 */
> +	return __builtin_bswap64(y);

Suppose y = 0x11223344UL, then, the compiler to cast it to a 64 bits
value yielding 0x0000000011223344ULL, __builtin_bswap64 will then
return 0x4433221100000000, and the return value will be casted back
to 32 bits, so swapb will always return 0, won't it?

> +#endif
> +}
> +
> +unsigned long _find_next_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,

The addr2 name seems a bit misleading, it seems to act as some kind of mask,
is that right?

> +		unsigned long start, unsigned long invert, unsigned long le)
> +{
> +	unsigned long tmp, mask;
> +
> +	if (unlikely(start >= nbits))
> +		return nbits;
> +
> +	tmp = addr1[start / BITS_PER_LONG];
> +	if (addr2)
> +		tmp &= addr2[start / BITS_PER_LONG];
> +	tmp ^= invert;
> +
> +	/* Handle 1st word. */
> +	mask = BITMAP_FIRST_WORD_MASK(start);
> +	if (le)
> +		mask = swab(mask);
> +
> +	tmp &= mask;
> +
> +	start = round_down(start, BITS_PER_LONG);
> +
> +	while (!tmp) {
> +		start += BITS_PER_LONG;
> +		if (start >= nbits)
> +			return nbits;
> +
> +		tmp = addr1[start / BITS_PER_LONG];
> +		if (addr2)
> +			tmp &= addr2[start / BITS_PER_LONG];
> +		tmp ^= invert;
> +	}

Isn't better to divide start by BITS_PER_LONG in the beginning of the fuction,
and then multiply it by BITS_PER_LONG when necessary, saving the division operations
in the while loop?

[snip]