On 03/15/2012 12:15 PM, Takuya Yoshikawa wrote: > Avi Kivity <avi@xxxxxxxxxx> wrote: > > > > Although using "inline" like this does not look clean, we could see > > > measurable performance improvements: get_dirty_log for 1GB dirty memory > > > became faster by more than 10% on my test box. > > > > > > > WOW. I'd have assumed the processor deals better with this; it should > > be 100% predicted branches. > > > > But I won't argue with cold data. > > What I checked was: > > original with-patch2 with-patch3 > 8.7ms 8.5ms 7.5ms What's the per-call numbers? > I assumed that without "inline" only __rmap_get_next() would be inlined > into rmap_get_next() so did like this. > > I thought the improvement was just from removing one function call for > each rmap_write_protect. Not sure if anything was changed with branch > predictions. What I mean is, modern cpus effectively inline simple function calls by predicting the call, and branchs within the function, and the return, so they don't have to stop their pipelines at any of these points. But again, the numbers talk louder than speculation about cpu architecture. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html