On Wed, Dec 08, 2021 at 08:35:34PM +0000, Pasha Tatashin wrote:
It is hard to root cause _refcount problems, because they usually manifest after the damage has occurred. Yet, they can lead to catastrophic failures such memory corruptions. There were a number of refcount related issues discovered recently [1], [2], [3]. Improve debugability by adding more checks that ensure that page->_refcount never turns negative (i.e. double free does not happen, or free after freeze etc). - Check for overflow and underflow right from the functions that modify _refcount - Remove set_page_count(), so we do not unconditionally overwrite _refcount with an unrestrained value - Trace return values in all functions that modify _refcount
You're doing a lot more atomic instructions with these patches. Have you done any performance measurements with these patches applied and debug disabled? I'm really not convinced it's worth closing one-instruction-wide races of this kind when they are "shouldn't ever happen" situations. If the debugging will catch the problem in 99.99% of cases and miss 0.01% without using atomic instructions, that seems like a better set of tradeoffs than catching 100% of problems by using the atomic instructions.