On Mon, 30 Jan 2023 16:11:58 +0000 Matthew Wilcox <willy@xxxxxxxxxxxxx> > On Sat, Jan 28, 2023 at 10:49:31PM -0800, Hugh Dickins wrote: > > I guess it will turn out not to be relevant to this particular syzbug, > > but what do we expect an mbind() of just 0x1000 of a THP to do? > > > > It's a subject I've wrestled with unsuccessfully in the past: I found > > myself arriving at one conclusion (split THP) in one place, and a contrary > > conclusion (widen range) in another place, and never had time to work out > > one unified answer. > > > > So I do wonder what pte replaces the migration entry when the bug here > > is fixed: is it a pte pointing into the THP as before, in which case > > what was the point of "migration"? is it a Copy-On-Bind page? > > or has the whole THP been migrated? > > I have an Opinion! > > The important thing about THP (IMO) is the Transparency part. > Applications don't need to do anything special to get memory managed > in larger chunks, the only difference is in performance. That is, they > get better performance if the kernel can do it, and thinks it worthwhile. > > The tradeoff with THP is that we treat all memory in this 2MB chunk the > same way; we track its dirtiness and age as a single thing (position > on LRU, etc). That assumes we're doing no harm, or less harm than we > would be tracking each page independently. > > If userspace gives us a hint like "I want this range of memory on that > node", that's a strong signal that *this* range of memory is considered > by userspace to be a single unit. So my opinion is that userspace is > letting us know that we previously made a bad decision and we should > rectify it by splitting now. Apart from MADV_HUGEPAGE, what do you need wrt tracking THP and splitting it?