On Tue, Mar 25, 2014 at 12:38 PM, Peter A. Felvegi <petschy@xxxxxxxxxxxxxxxxxx> wrote: > > The reduced test case is at the end. It encodes data into a buffer in a loop > with variable length encoding (not a working real encoding). For some > reason, the write ptr is not kept in a register, but loaded/stored when > used/updated. There is a potential function call in the loop, but there are > __builtin_expect hints, so I think it would be possible to use a register > for the ptr and store just before the call, and load it back right after the > call. This would speed up the common code path: less code, less loads and > stores.I measured around 20-30% more runtime, compared to a version where a > pointer goes in and the updated ptr is returned. However, passing/returning > the ptr has other issues, esp for a decoder, that would return the decoded > value normally, not the ptr. You marked the encode_noinline function as noinline, and encode can call encode_noinline. The encode_noinline function could change any part of global memory, and in particular could change the value of n->next. So the loop has to reload that value, in case it was changed. Ian