> I do have dynamic memory allocations, but not that many. I use expression templates, which eliminate most temporaries. This is a bit down a side track, but I'm pretty sure the "temporaries" you eliminated with expression templates, would not have used the same memory pool as new/delete, even if you hadn't eliminated them. So you may have saved lots of copying and maybe some code execution on constructors/destructors with that design, but there wasn't significant allocation overhead there to be saved. If you might be confused about what constructs do/don't share the memory pool with new/delete, looking for excess of that in your design may still be worth more effort. > The last time I profiled my testcases, virtually all of the runtime was spent in the “matrix”-vector products of the BiCGStab algorithm Was that profiling done while the 30% slowdown (that you're currently trying to diagnose) was present? If not, maybe you should profile again. If the time is where you indicated, while the slowdown is present, then the likely cause is CPU cache misses. Those cache misses could be caused by reuse of the fragmented memory freed by that line in the library (vs. if those many fragments were not freed, the subsequent allocations would take a contiguous chunk of additional address space, which might be more cache friendly).So the indicated path forward would be to look at the allocation of the data that is used by BiCGStab algorithm, and think about what might make those allocations more/less cache friendly. Sorry about my repeated quoting/formatting etc problems in these posts. I haven't yet gotten used to the way AOL-Mail works against me. I read this mailing list, but don't post often, because if the question were solidly on-topic, I wouldn't know the answer.