On Wed, Jul 17, 2024 at 04:45:07PM -0400, Aristeu Rozanski wrote: > On Tue, Jul 16, 2024 at 11:44:27AM -0400, Aristeu Rozanski wrote: > > Taking a look on this. > > So it looks like to be a timing issue. While spreading some code to figure out > exactly which exact sequence is causing the issue, it makes the error go away > in the 'counters' test. More specifically one of the sequences: > > /* touched, shared mmap */ > map(SL_TEST, 1, MAP_SHARED); > touch(SL_TEST, 1, MAP_SHARED); > unmap(SL_TEST, 1, MAP_SHARED); > > fails because it's expecting: > > HugePages_{Total,Free} = 1 > HugePages_Surp = 0 > > but gets: > > HugePages_{Total,Free} = 2 > HugePages_Surp = 1 > > which seems caused by a surplus page taking too long to be freed, thus > timing making difference here. > > I'm not sure as why it'd take longer with my patch applied but will keep > digging. It really seems to be a matter of small timing difference. Even poking with perf is enough to not be able to reproduce the problem anymore. Will get in contact with the libhugetlbfs folks, might need to implement on the counters test intelligence to when there're surplus pages around wait for a little bit to give a chance for it to be freed. I believe we're good to go. Comments? -- Aristeu