Re: [GSoC][PATCH v2] Optimize ewah_bitmap.c for efficiency using trailing zeros for set bit iteration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Karthik Nayak <karthik.188@xxxxxxxxx> writes:

> Aryan Gupta <garyan447@xxxxxxxxx> writes:
>
> Hello,
>
>> Signed-off-by: Aryan Gupta <garyan447@xxxxxxxxx>
>> ---
>>
>> Thank you Vicent for the guidance. I am still not sure how
>> to do the performance measurement for this improvement. Any
>> guidance would be appreciated.
>>
>
> I guess there is some off-list discussion here. That along with the fact
> that the commit message is missing makes it really hard to understand
> how this is better than what was here already.
>
> The guidelines ('Documentation/SubmittingPatches') also state how to
> draft the commit message. This patch only seems to have a title, it is
> recommend to add a description as to why this change is being made.

Yes.

>> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
>> index 8785cbc54a..1a75f50682 100644
>> --- a/ewah/ewah_bitmap.c
>> +++ b/ewah/ewah_bitmap.c
>> @@ -257,12 +257,15 @@ void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo
>>  		for (k = 0; k < rlw_get_literal_words(word); ++k) {
>>  			int c;
>>
>> -			/* todo: zero count optimization */
>> -			for (c = 0; c < BITS_IN_EWORD; ++c, ++pos) {
>> -				if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0)
>> -					callback(pos, payload);
>> +			eword_t bitset = self->buffer[pointer];
>> +			while(bitset != 0) {
>> +				eword_t t = bitset & -bitset;
>> +				int r = __builtin_ctzl(bitset);
>> +				bitset ^= t;
>> +				callback(pos+r, payload);
>>  			}
>> -
>> +			
>> +			pos += BITS_IN_EWORD;
>>  			++pointer;
>>  		}
>>  	}
>
> The bit manipulation done here is slightly hard to comprehend, it would
> be nice if you could also add some comments as to what is being done
> here and why.

In addition, this patch assumes that __builtin_ctzl() function is
always available no matter what environment the code is built on,
which I am not sure is a safe.  Quite honestory, I suspect that the
whole of "todo" is to seamlessly detect the presense of the builtin
support to count the top zero bit, use it only when it is there, and
giving a fallback implementation when it does not exist.  The code
itself to use the builtin is only 20% of that effort ;-)

And of course, there is benchmark.  To show how much better
performance gets for people with that function, and more importantly
to show that the performance does not degrade for those who are
without.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux