On 04/24/2018 11:28 PM, Christoph Hellwig wrote: > On Tue, Apr 24, 2018 at 10:27:21PM -0700, Eric Dumazet wrote: >> When adding tcp mmap() implementation, I forgot that socket lock >> had to be taken before current->mm->mmap_sem. syzbot eventually caught >> the bug. >> >> Since we can not lock the socket in tcp mmap() handler we have to >> split the operation in two phases. >> >> 1) mmap() on a tcp socket simply reserves VMA space, and nothing else. >> This operation does not involve any TCP locking. >> >> 2) setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements >> the transfert of pages from skbs to one VMA. >> This operation only uses down_read(¤t->mm->mmap_sem) after >> holding TCP lock, thus solving the lockdep issue. >> >> This new implementation was suggested by Andy Lutomirski with great details. > > Thanks, this looks much more sensible to me. > Thanks Christoph Note the high cost of zap_page_range(), needed to avoid -EBUSY being returned from vm_insert_page() the second time TCP_ZEROCOPY_RECEIVE is used on one VMA. Ideally a vm_replace_page() would avoid this cost ? 6.51% tcp_mmap [kernel.kallsyms] [k] unmap_page_range 5.90% tcp_mmap [kernel.kallsyms] [k] vm_insert_page 4.85% tcp_mmap [kernel.kallsyms] [k] _raw_spin_lock 4.50% tcp_mmap [kernel.kallsyms] [k] mark_page_accessed 3.51% tcp_mmap [kernel.kallsyms] [k] page_remove_rmap 2.99% tcp_mmap [kernel.kallsyms] [k] page_add_file_rmap 2.53% tcp_mmap [kernel.kallsyms] [k] release_pages 2.38% tcp_mmap [kernel.kallsyms] [k] put_page 2.37% tcp_mmap [kernel.kallsyms] [k] smp_call_function_single 2.28% tcp_mmap [kernel.kallsyms] [k] __get_locked_pte 2.25% tcp_mmap [kernel.kallsyms] [k] do_tcp_setsockopt.isra.35 2.21% tcp_mmap [kernel.kallsyms] [k] page_clear_age