> If the time is where you indicated, while the slowdown is present, then the likely cause is CPU cache misses. Those > cache misses could be caused by reuse of the fragmented memory freed by that line in the library (vs. if those many > fragments were not freed, the subsequent allocations would take a contiguous chunk of additional address space, > which might be more cache friendly). I believe that that diagnosis may explain what I'm observing. I profiled several testcases and, as before, most of the runtime is spent in the matrix-vector products. If I call gmsh::finalize, matrix-vector products take up to 2 times longer than if I don't. Other parts of the programs aren't significantly affected. There are no allocations or deallocations in those matrix-vector products. The instructions involved should be approximately those that I pasted bellow. IDVecVec is a std::vector<std::vector<std::size_t>> containing the indices of each cell's neighbour cells. It's still surprising to me that freeing memory in a shared library, when there is plenty of free RAM available (forgot to mention that my testcases consume very little memory), affects the performance of a totally unrelated code. Is there a remedy other than not calling gmsh::finalize? The good thing is that I should be able to prepare a more or less reduced testcase for the Gmsh devs to test. Thanks so much for your help! push r15 push r14 push r13 push r12 push rbp push rbx mov rbp, QWORD PTR [rdi] test rbp, rbp je .L23 mov rax, QWORD PTR [rsi+24] mov r14, QWORD PTR [rdi+8] mov r11, QWORD PTR [rax] mov rax, QWORD PTR [rsi+8] mov r15, QWORD PTR [rsi+32] mov r13, QWORD PTR [rax+8] mov rax, QWORD PTR [rsi] mov r10, QWORD PTR VF::TMalla<2ul>::IDVecVec[rip] mov r12, QWORD PTR [rax+8] vmovsd xmm3, QWORD PTR .LC1[rip] mov rbx, rsi sal rbp, 3 xor r9d, r9d vxorpd xmm4, xmm4, xmm4 .L16: mov rax, QWORD PTR [r11+8+r9*2] mov rcx, QWORD PTR [r11+r9*2] mov rdx, QWORD PTR [r10] lea rdi, [rax+rcx*8] mov rsi, QWORD PTR [r10+8] cmp rax, rdi je .L18 cmp rdx, rsi je .L18 mov r8, QWORD PTR [r15+8] vmovsd xmm0, xmm4, xmm4 .L14: mov rcx, QWORD PTR [rdx] vmovsd xmm5, QWORD PTR [rax] add rdx, 8 vfmadd231sd xmm0, xmm5, QWORD PTR [r8+rcx*8] add rax, 8 cmp rsi, rdx je .L13 cmp rdi, rax jne .L14 .L13: vmovsd xmm1, QWORD PTR [r12+r9] vdivsd xmm2, xmm3, QWORD PTR [rbx+16] vmulsd xmm1, xmm1, QWORD PTR [r13+0+r9] add r10, 24 vfmadd132sd xmm1, xmm0, xmm2 vmovsd QWORD PTR [r14+r9], xmm1 add r9, 8 cmp rbp, r9 jne .L16 .L23: pop rbx pop rbp pop r12 pop r13 pop r14 pop r15 ret .L18: vmovsd xmm0, xmm4, xmm4 jmp .L13