Karthik Nayak <karthik.188@xxxxxxxxx> writes: > Aryan Gupta <garyan447@xxxxxxxxx> writes: > > Hello, > >> Signed-off-by: Aryan Gupta <garyan447@xxxxxxxxx> >> --- >> >> Thank you Vicent for the guidance. I am still not sure how >> to do the performance measurement for this improvement. Any >> guidance would be appreciated. >> > > I guess there is some off-list discussion here. That along with the fact > that the commit message is missing makes it really hard to understand > how this is better than what was here already. > > The guidelines ('Documentation/SubmittingPatches') also state how to > draft the commit message. This patch only seems to have a title, it is > recommend to add a description as to why this change is being made. Yes. >> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c >> index 8785cbc54a..1a75f50682 100644 >> --- a/ewah/ewah_bitmap.c >> +++ b/ewah/ewah_bitmap.c >> @@ -257,12 +257,15 @@ void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo >> for (k = 0; k < rlw_get_literal_words(word); ++k) { >> int c; >> >> - /* todo: zero count optimization */ >> - for (c = 0; c < BITS_IN_EWORD; ++c, ++pos) { >> - if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0) >> - callback(pos, payload); >> + eword_t bitset = self->buffer[pointer]; >> + while(bitset != 0) { >> + eword_t t = bitset & -bitset; >> + int r = __builtin_ctzl(bitset); >> + bitset ^= t; >> + callback(pos+r, payload); >> } >> - >> + >> + pos += BITS_IN_EWORD; >> ++pointer; >> } >> } > > The bit manipulation done here is slightly hard to comprehend, it would > be nice if you could also add some comments as to what is being done > here and why. In addition, this patch assumes that __builtin_ctzl() function is always available no matter what environment the code is built on, which I am not sure is a safe. Quite honestory, I suspect that the whole of "todo" is to seamlessly detect the presense of the builtin support to count the top zero bit, use it only when it is there, and giving a fallback implementation when it does not exist. The code itself to use the builtin is only 20% of that effort ;-) And of course, there is benchmark. To show how much better performance gets for people with that function, and more importantly to show that the performance does not degrade for those who are without. Thanks.