On Sat, Jul 8, 2017 at 11:50 PM, Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> wrote: >> It looks very clean on the caller side. It perform extremely >> fast. All the internal variable will easy promote to register. > > But since all real use of the macros will call some functions > inside the loops all those nice registers will need to be pushed > on the stack. Yes, if the loop need to external functions. Those register will sync to memory. There is also a lot of loops that don't need to. I am looking at simple.c just because I have it open in my editor. I count 5 ptrlist loop without function calls, 1 with. Simple.c might not be typical because it is likely doing more small stuff. But it give an example the loop without function call is not rare at all. > > This sort of artificial benchmark doesn't mean anything, you > should know that. What you need to measure is how the stuff is I really don't like what you are suggesting. It is fair and straight compare. Artificial benchmark usually means pick the workload to make certain number look good and ignore the other workload which numbers that looks bad. I thing in this case, it is one sided. If you still think my test is artificial and put your ptrlist iteration in unfair situation. I encourage you come up with an artificial test which let your version of ptrlist iterator come out favorable in performance number. If we can't find such case, then it means it is universal not artificial. > really used to see the real effect on cache and generated code. > The we can talk about timing, slowdown & performance. > Have you done this and seen a significative difference? > (I did it, of course, and there wasn't anything significative, > all witihn the noise of the measurements). For the record, I did run your code with kernel source checking. "wasn't anything significative" by itself was not good enough to accept the patch. Such a big rewrite of such fundamental building block better have good enough reason for it. I don't see a lot of up side to justify it. I definitely see the down side in performance. > > Also, in November while I was fixing the problem with empty > blocks, Linus himself said about the ptrlist macros: > "Some of those inlines look large enough that I wonder > how much sense they make as inlines any more" > And indeed, having all those levels of looping inline/in macros > is quite questionnable regarding the quality of the code that can > be generated and impact on i-cache. Maybe he mean just take out the inline keyword and let those ptr_xxx function stay as function not inline function? Anyway, if there is more specific code we can discuss. I can take a look at what needs to be done as well. It doesn't have to be a big rewrite, especially the performance number is very hard for me to swallow. > It was Linus' remark and combined the issue and the complexity/subtility > which motivated me to see what else can be done with these lists. The thing is, put those variable in to struct members does not magically make it clean and less complex. I consider those change cosmetic. Different people have different preference. It is hard to make every one happy on how the code should look. I am reluctant to take *pure* cosmetic change without real functional impact. The performance number here is a deal breaker for me. There might be other ways to improve it. This is not it. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html