+ bitops-optimize-fns-for-improved-performance.patch added to mm-nonmm-unstable branch

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Thu, 02 May 2024 08:18:28 -0700

The patch titled
     Subject: bitops: optimize fns() for improved performance
has been added to the -mm mm-nonmm-unstable branch.  Its filename is
     bitops-optimize-fns-for-improved-performance.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/bitops-optimize-fns-for-improved-performance.patch

This patch will later appear in the mm-nonmm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
Subject: bitops: optimize fns() for improved performance
Date: Thu, 2 May 2024 17:24:43 +0800

The current fns() repeatedly uses __ffs() to find the index of the least
significant bit and then clears the corresponding bit using __clear_bit().
The method for clearing the least significant bit can be optimized by
using word &= word - 1 instead.

Typically, the execution time of one __ffs() plus one __clear_bit() is
longer than that of a bitwise AND operation and a subtraction.  To improve
performance, the loop for clearing the least significant bit has been
replaced with word &= word - 1, followed by a single __ffs() operation to
obtain the answer.  This change reduces the number of __ffs() iterations
from n to just one, enhancing overall performance.

This modification significantly accelerates the fns() function in the
test_bitops benchmark, improving its speed by approximately 7.6 times. 
Additionally, it enhances the performance of find_nth_bit() in the
find_bit benchmark by approximately 26%.

Before:
test_bitops: fns:            58033164 ns
find_nth_bit:                  4254313 ns,  16525 iterations

After:
test_bitops: fns:             7637268 ns
find_nth_bit:                  3362863 ns,  16501 iterations

Link: https://lkml.kernel.org/r/20240502092443.6845-3-visitorckw@xxxxxxxxx
Signed-off-by: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
Cc: Ching-Chun (Jim) Huang <jserv@xxxxxxxxxxxxxxxx>
Cc: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
Cc: Yury Norov <yury.norov@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 include/linux/bitops.h |   12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

--- a/include/linux/bitops.h~bitops-optimize-fns-for-improved-performance
+++ a/include/linux/bitops.h
@@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64
  */
 static inline unsigned long fns(unsigned long word, unsigned int n)
 {
-	unsigned int bit;
+	while (word && n--)
+		word &= word - 1;
 
-	while (word) {
-		bit = __ffs(word);
-		if (n-- == 0)
-			return bit;
-		__clear_bit(bit, &word);
-	}
-
-	return BITS_PER_LONG;
+	return word ? __ffs(word) : BITS_PER_LONG;
 }
 
 /**
_

Patches currently in -mm which might be from visitorckw@xxxxxxxxx are

lib-test_bitops-add-benchmark-test-for-fns.patch
bitops-optimize-fns-for-improved-performance.patch