On Wed, Dec 8, 2021 at 4:05 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, Dec 08, 2021 at 08:35:34PM +0000, Pasha Tatashin wrote: > > It is hard to root cause _refcount problems, because they usually > > manifest after the damage has occurred. Yet, they can lead to > > catastrophic failures such memory corruptions. There were a number > > of refcount related issues discovered recently [1], [2], [3]. > > > > Improve debugability by adding more checks that ensure that > > page->_refcount never turns negative (i.e. double free does not > > happen, or free after freeze etc). > > > > - Check for overflow and underflow right from the functions that > > modify _refcount > > - Remove set_page_count(), so we do not unconditionally overwrite > > _refcount with an unrestrained value > > - Trace return values in all functions that modify _refcount > Hi Matthew, Thank you for looking at this series. > You're doing a lot more atomic instructions with these patches. This is not exactly so. There are no *more* atomic instructions. There are, however, different atomic instructions: For example: atomic_add() becomes atomic_fetch_add() On x86 it is: atomic_add: lock add %eax,(%rsi) atomic_fetch_add: lock xadd %eax,(%rsi) On ARM64, I believe the same CAS instruction is used for both. Have you > done any performance measurements with these patches applied and debug > disabled? Yes, I have done some performance tests exactly as you described with CONFIG_DEBUG_VM disabled and these patches applied. I tried: hackbench, unixbench, and a few more benchmarks; I did not see any performance difference. > I'm really not convinced it's worth closing > one-instruction-wide races of this kind when they are "shouldn't ever > happen" situations. If the debugging will catch the problem in 99.99% > of cases and miss 0.01% without using atomic instructions, that seems > like a better set of tradeoffs than catching 100% of problems by using > the atomic instructions. I think we should relax the precise catching of bugs only if there is indeed a measurable performance impact. The problem is that if there is a __refcount bug, the security consequences are dire as it may lead to leaking memory from one process to another. Thanks, Pasha