+ bitops-optimize-fns-for-improved-performance.patch added to mm-nonmm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: bitops: optimize fns() for improved performance
has been added to the -mm mm-nonmm-unstable branch.  Its filename is
     bitops-optimize-fns-for-improved-performance.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/bitops-optimize-fns-for-improved-performance.patch

This patch will later appear in the mm-nonmm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
Subject: bitops: optimize fns() for improved performance
Date: Fri, 26 Apr 2024 11:51:52 +0800

The current fns() repeatedly uses __ffs() to find the index of the least
significant bit and then clears the corresponding bit using __clear_bit().
The method for clearing the least significant bit can be optimized by
using word &= word - 1 instead.

Typically, the execution time of one __ffs() plus one __clear_bit() is
longer than that of a bitwise AND operation and a subtraction.  To improve
performance, the loop for clearing the least significant bit has been
replaced with word &= word - 1, followed by a single __ffs() operation to
obtain the answer.  This change reduces the number of __ffs() iterations
from n to just one, enhancing overall performance.

The following microbenchmark data, conducted on my x86-64 machine, shows
the execution time (in microseconds) required for 1000000 test data
generated by get_random_u64() and executed by fns() under different values
of n:

+-----+---------------+---------------+
|  n  |   time_old    |   time_new    |
+-----+---------------+---------------+
|  0  |     29194     |     25878     |
|  1  |     25510     |     25497     |
|  2  |     27836     |     25721     |
|  3  |     30140     |     25673     |
|  4  |     32569     |     25426     |
|  5  |     34792     |     25690     |
|  6  |     37117     |     25651     |
|  7  |     39742     |     25383     |
|  8  |     42360     |     25657     |
|  9  |     44672     |     25897     |
| 10  |     47237     |     25819     |
| 11  |     49884     |     26530     |
| 12  |     51864     |     26647     |
| 13  |     54265     |     28915     |
| 14  |     56440     |     28373     |
| 15  |     58839     |     28616     |
| 16  |     62383     |     29128     |
| 17  |     64257     |     30041     |
| 18  |     66805     |     29773     |
| 19  |     69368     |     33203     |
| 20  |     72942     |     33688     |
| 21  |     77006     |     34518     |
| 22  |     80926     |     34298     |
| 23  |     85723     |     35586     |
| 24  |     90324     |     36376     |
| 25  |     95992     |     37465     |
| 26  |    101101     |     37599     |
| 27  |    106520     |     37466     |
| 28  |    113287     |     38163     |
| 29  |    120552     |     38810     |
| 30  |    128040     |     39373     |
| 31  |    135624     |     40500     |
| 32  |    142580     |     40343     |
| 33  |    148915     |     40460     |
| 34  |    154005     |     41294     |
| 35  |    157996     |     41730     |
| 36  |    160806     |     41523     |
| 37  |    162975     |     42088     |
| 38  |    163426     |     41530     |
| 39  |    164872     |     41789     |
| 40  |    164477     |     42505     |
| 41  |    164758     |     41879     |
| 42  |    164182     |     41415     |
| 43  |    164842     |     42119     |
| 44  |    164881     |     42297     |
| 45  |    164870     |     42145     |
| 46  |    164673     |     42066     |
| 47  |    164616     |     42051     |
| 48  |    165055     |     41902     |
| 49  |    164847     |     41862     |
| 50  |    165171     |     41960     |
| 51  |    164851     |     42089     |
| 52  |    164763     |     41717     |
| 53  |    164635     |     42154     |
| 54  |    164757     |     41983     |
| 55  |    165095     |     41419     |
| 56  |    164641     |     42381     |
| 57  |    164601     |     41654     |
| 58  |    164864     |     41834     |
| 59  |    164594     |     41920     |
| 60  |    165207     |     42020     |
| 61  |    165056     |     41185     |
| 62  |    165160     |     41722     |
| 63  |    164923     |     41702     |
| 64  |    164777     |     41880     |
+-----+---------------+---------------+

Link: https://lkml.kernel.org/r/20240426035152.956702-1-visitorckw@xxxxxxxxx
Signed-off-by: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
Cc: Ching-Chun (Jim) Huang <jserv@xxxxxxxxxxxxxxxx>
Cc: Yury Norov <yury.norov@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 include/linux/bitops.h |   12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

--- a/include/linux/bitops.h~bitops-optimize-fns-for-improved-performance
+++ a/include/linux/bitops.h
@@ -254,16 +254,12 @@ static inline unsigned long __ffs64(u64
  */
 static inline unsigned long fns(unsigned long word, unsigned int n)
 {
-	unsigned int bit;
+	unsigned int i;
 
-	while (word) {
-		bit = __ffs(word);
-		if (n-- == 0)
-			return bit;
-		__clear_bit(bit, &word);
-	}
+	for (i = 0; word && i < n; i++)
+		word &= word - 1;
 
-	return BITS_PER_LONG;
+	return word ? __ffs(word) : BITS_PER_LONG;
 }
 
 /**
_

Patches currently in -mm which might be from visitorckw@xxxxxxxxx are

bitops-optimize-fns-for-improved-performance.patch





[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux