On Fri 08-11-13 16:28:15, Andiry Xu wrote: > On Thu, Nov 7, 2013 at 2:45 PM, Andiry Xu <andiry@xxxxxxxxx> wrote: > > On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara <jack@xxxxxxx> wrote: > >> On Thu 07-11-13 13:50:09, Andiry Xu wrote: > >>> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara <jack@xxxxxxx> wrote: > >>> > On Thu 07-11-13 12:14:13, Andiry Xu wrote: > >>> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara <jack@xxxxxxx> wrote: > >>> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote: > >>> >> >> >> Do you know the reason why write() outperforms mmap() in some cases? I > >>> >> >> >> know it's not related the thread but I really appreciate if you can > >>> >> >> >> answer my question. > >>> >> >> > Well, I'm not completely sure. mmap()ed memory always works on page-by-page > >>> >> >> > basis - you first access the page, it gets faulted in and you can further > >>> >> >> > access it. So for small (sub page size) accesses this is a win because you > >>> >> >> > don't have an overhead of syscall and fs write path. For accesses larger > >>> >> >> > than page size the overhead of syscall and some initial checks is well > >>> >> >> > hidden by other things. I guess write() ends up being more efficient > >>> >> >> > because write path taken for each page is somewhat lighter than full page > >>> >> >> > fault. But you'd need to look into perf data to get some hard numbers on > >>> >> >> > where the time is spent. > >>> >> >> > > >>> >> >> > >>> >> >> Thanks for the reply. However I have filled up the whole RAM disk > >>> >> >> before doing the test, i.e. asked the brd driver to allocate all the > >>> >> >> pages initially. > >>> >> > Well, pages in ramdisk are always present, that's not an issue. But you > >>> >> > will get a page fault to map a particular physical page in process' > >>> >> > virtual address space when you first access that virtual address in the > >>> >> > mapping from the process. The cost of setting up this virtual->physical > >>> >> > mapping is what I'm talking about. > >>> >> > > >>> >> > >>> >> Yes, you are right, there are page faults observed with perf. I > >>> >> misunderstood page fault as copying pages between backing store and > >>> >> physical memory. > >>> >> > >>> >> > If you had a process which first mmaps the file and writes to all pages in > >>> >> > the mapping and *then* measure the cost of another round of writing to the > >>> >> > mapping, I would expect you should see speeds close to those of memory bus. > >>> >> > > >>> >> > >>> >> I've tried this as well. mmap() performance improves but still not as > >>> >> good as write(). > >>> >> I used the perf report to compare write() and mmap() applications. For > >>> >> write() version, top of perf report shows as: > >>> >> 33.33% __copy_user_nocache > >>> >> 4.72% ext2_get_blocks > >>> >> 4.42% mutex_unlock > >>> >> 3.59% __find_get_block > >>> >> > >>> >> which looks reasonable. > >>> >> > >>> >> However, for mmap() version, the perf report looks strange: > >>> >> 94.98% libc-2.15.so [.] 0x000000000014698d > >>> >> 2.25% page_fault > >>> >> 0.18% handle_mm_fault > >>> >> > >>> >> I don't know what the first item is but it took the majority of cycles. > >>> > The first item means that it's some userspace code in libc. My guess > >>> > would be that it's libc's memcpy() function (or whatever you use to write > >>> > to mmap). How do you access the mmap? > >>> > > >>> > >>> Like this: > >>> > >>> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); > >>> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); > >>> for (i = 0; i < count; i++) > >>> { > >>> memcpy(dest, src, request_size); > >>> dest += request_size; > >>> } > >> OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how > >> to tune that though... > >> > > > > Hmm, I will try some different kinds of memcpy to see if there is a > > difference. Just want to make sure I do not make some stupid mistakes > > before trying that. > > Thanks a lot for your help! > > > > Your advice does makes difference. I use a optimized version of memcpy > and it does improve the mmap application performance: on a Ramdisk > with Ext2 xip, mmap() version now achieves 11GB/s of bandwidth, > comparing to posix write version with 7GB/s. Good :). > Now I wonder if they have a plan to update the memcpy() in libc.. You better ask at glibc devel list... I've google for a while whether memcpy() in glibc can be somehow tuned (for a particular instruction set) but didn't find anything useful. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html