On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote: > On 07/07/2014 01:21 PM, Naoya Horiguchi wrote: > > On Mon, Jul 07, 2014 at 12:01:41PM -0700, Dave Hansen wrote: > >> But, is this trying to do too many things at once? Do we have solid use > >> cases spelled out for each of these modes? Have we thought out how they > >> will be used in practice? > > > > tools/vm/page-types.c will be an in-kernel user after this base code is > > accepted. The idea of doing fincore() thing comes up during the discussion > > with Konstantin over file cache mode of this tool. > > pfn and page flag are needed there, so I think it's one clear usecase. > > I'm going to take that as a no. :) As for other usecases, database developers should have some demand for physical addresses (especially numa node?) or page flags (especially page reclaim or writeback related ones). But I'm not a database expert so can't say how, sorry. > The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will > come up in practice. We just don't have the interfaces for an end user > to pick which one they want to use. > > >> Is it really right to say this is going to be 8 bytes? Would we want it > >> to share types with something else, like be an loff_t? > > > > Could you elaborate it more? > > We specify file offsets in other system calls, like the lseek family. I > was just thinking that this type should match up with those calls since > they are expressing the same data type with the same ranges and limitations. The 2nd parameter is loff_t, do we already do this? > >>> + * - FINCORE_PFN: > >>> + * stores pfn, using 8 bytes. > >> > >> These are all an unprivileged operations from what I can tell. I know > >> we're going to a lot of trouble to hide kernel addresses from being seen > >> in userspace. This seems like it would be undesirable for the folks > >> that care about not leaking kernel addresses, especially for > >> unprivileged users. > >> > >> This would essentially tell userspace where in the kernel's address > >> space some user-controlled data will be. > > > > OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users. Sorry, this statement of mine might a bit short-sighted, and I'd like to revoke it. I think that some page flags and/or numa info should be useful outside the debugging environment, and safe to expose to userspace. So limiting to bitmap-one for unprivileged users is too strict. > Then I'd just question their usefulness outside of a debugging > environment, especially when you can get at them in other (more > roundabout) ways in a debugging environment. > > This is really looking to me like two system calls. The bitmap-based > one, and another more extensible one. I don't think there's any harm in > having two system calls, especially when they're trying to glue together > two disparate interfaces. I think that if separating syscall into two, one for privileged users and one for unprivileged users migth be fine (rather than bitmap-based one and extensible one.) Thanks, Naoya Horiguchi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>