On Wed, Jan 26, 2022 at 06:34:21PM +0000, Pasha Tatashin wrote: > The problems with page->_refcount are hard to debug, because usually > when they are detected, the damage has occurred a long time ago. Yet, > the problems with invalid page refcount may be catastrophic and lead to > memory corruptions. > > Reduce the scope of when the _refcount problems manifest themselves by > adding checks for underflows and overflows into functions that modify > _refcount. If you're chasing a bug like this, presumably you turn on page tracepoints. So could we reduce the cost of this by putting the VM_BUG_ON_PAGE parts into __page_ref_mod() et al? Yes, we'd need to change the arguments to those functions to pass in old & new, but that should be a cheap change compared to embedding the VM_BUG_ON_PAGE. > static inline void page_ref_add(struct page *page, int nr) > { > - atomic_add(nr, &page->_refcount); > + int old_val = atomic_fetch_add(nr, &page->_refcount); > + int new_val = old_val + nr; > + > + VM_BUG_ON_PAGE((unsigned int)new_val < (unsigned int)old_val, page); > if (page_ref_tracepoint_active(page_ref_mod)) > __page_ref_mod(page, nr); > }