On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote: > > Right now we have a mixed bag. zero_user() [and it's variants, circa 2008] > > does a BUG_ON.[0] While the other ones do nothing; clear_highpage(), > > clear_user_highpage(), copy_user_highpage(), and copy_highpage(). > > Erm, those functions operate on the entire PAGE_SIZE. There's nothing > for them to check. > > > While continuing to audit the code I don't see any users who would violating > > the API with a simple conversion of the code. The calls which I have worked on > > [which is many at this point] all have checks in place which are well aware of > > page boundaries. > > Oh good, then this BUG_ON won't trigger. > > > Therefore, I tend to agree with Dan that if anything is to be done it should be > > a WARN_ON() which is only going to throw an error that something has probably > > been wrong all along and should be fixed but continue running as before. > > Silent data corruption is for ever. Are you absolutely sure nobody has > done: > > page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3); > memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2); > > because that will work fine if the pages come from ZONE_NORMAL and fail > miserably if they came from ZONE_HIGHMEM. ...and violently regress with the BUG_ON. The question to me is: which is more likely that any bad usages have been covered up by being limited to ZONE_NORMAL / 64-bit only, or that silent data corruption has been occurring with no ill effects? > > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that > > we know we might be getting wrong".[1] And because, "BUG() is only good for > > something that never happens and that we really have no other option for".[2] > > BUG() is our only option here. Both limiting how much we copy or > copying the requested amount result in data corruption or leaking > information to a process that isn't supposed to see it. At a minimum I think this should be debated in a follow on patch to add assertion checking where there was none before. There is no evidence of a page being overrun in the audit Ira performed.