On Wednesday 15 April 2009 00:12:09 Andrea Arcangeli wrote: > On Tue, Apr 14, 2009 at 10:39:54PM +0900, KOSAKI Motohiro wrote: > > I guess you dislike get_user_page_fast() grab pte_lock too, right? > > If get_user_page_fast is vetoed to run a set_bit on the already cache > hot and exclusive struct page, I doubt taking a potentially cache > cold, mm-wide or pmd-wide pte_lock is ok. Yes, I'd *really* rather not. I actually implemented gup_fast in response to problem reported with DB2 workload hitting the ptl (and not the more obvious mmap_sem, although certainly they had some gain from removing that cacheline as well). gup_fast iirc is worth nearly 10% on a 4 socket x86 system with DB2. That's the same order of magnitude as the speedups quoted to justify the addition of hugepages, or O_DIRECT itself. Andrea: I didn't veto that set_bit change of yours as such. I just noted there could be more atomic operations. Actually I would welcome more comparison between our two approaches, but they seem to be stuck with Linus refusing (I think) to copy the page at fork() time :( -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html