On Fri, May 25, 2012 at 7:43 PM, David Miller <davem@xxxxxxxxxxxxx> wrote: > > It CSE's them into the loop, but like I said a few days > ago it doesn't CSE them into the find_zero() code block. Ok, I think I have a solution. This is *totally* untested, but it compiles on x86. And I think it's "close to the right thing". It makes those constants explicit, so that sharing them is easy when we have two different users, and actually does that in fs/namei.c. The interface is a bit odd, but the rules are: - has_zero -> prep_zero_mask -> create_zero_mask -> { zero_bytemask , find_zero } where two masks that have been created by prep_zero_mask can be or'ed together to create one "one or the other" mask. So the has_zero -> prep_zero_mask boundary is due to your BE efficiency issue (ie "has_zero()" goes inside the loop, and "prep_zero_mask()" goes outside). The prep_zero_mask -> create_zero_mask boundary is due to that "we can combine multiple masks at this level" issue. And the create_zero_mask -> { zero_bytemask , find_zero } boundary is because we actually want to create a bytemask for some things, and just find the zero for others. Does this *work*? I don't know. But it generates code that looks *roughly* sane. Btw, this patch replaces the one you sent - the interdiff didn't look sane (but it goes on top of the one I sent out earlier). And I just put your BE-specific header into asm-generic after all. Does this look sane to you? Linus
Attachment:
patch.diff
Description: Binary data