On Wed, Dec 08, 2021 at 08:35:34PM +0000, Pasha Tatashin wrote: > It is hard to root cause _refcount problems, because they usually > manifest after the damage has occurred. Yet, they can lead to > catastrophic failures such memory corruptions. There were a number > of refcount related issues discovered recently [1], [2], [3]. > > Improve debugability by adding more checks that ensure that > page->_refcount never turns negative (i.e. double free does not > happen, or free after freeze etc). > > - Check for overflow and underflow right from the functions that > modify _refcount > - Remove set_page_count(), so we do not unconditionally overwrite > _refcount with an unrestrained value > - Trace return values in all functions that modify _refcount You're doing a lot more atomic instructions with these patches. Have you done any performance measurements with these patches applied and debug disabled? I'm really not convinced it's worth closing one-instruction-wide races of this kind when they are "shouldn't ever happen" situations. If the debugging will catch the problem in 99.99% of cases and miss 0.01% without using atomic instructions, that seems like a better set of tradeoffs than catching 100% of problems by using the atomic instructions.