Re: a question of mmap() of files into memory

"Peter Teoh" <htmldeveloper@xxxxxxxxx> · Tue, 25 Nov 2008 00:13:55 +0800

On Mon, Nov 24, 2008 at 12:17 PM, MinChan Kim <minchan.kim@xxxxxxxxx> wrote:
> On Mon, Nov 24, 2008 at 11:58 AM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>> On Mon, Nov 24, 2008 at 7:29 AM, MinChan Kim <minchan.kim@xxxxxxxxx> wrote:
>>> I'm so late :)
>>>
>>> On Sun, Nov 23, 2008 at 4:17 PM, Wang Yu <wangyuict@xxxxxxxxx> wrote:
>>>>
>>>>
>>>> On Sun, Nov 23, 2008 at 2:23 AM, Rik van Riel <riel@xxxxxxxxxxx> wrote:
>>>>>
>>>>> Peter Teoh wrote:
>>>>>>
>>>>>> when a process mmap() a section of a file into its own process memory,
>>>>>> the process memory will maintain a copy of the data of that section of
>>>>>> the file.
>>>>>
>>>>> No, it does not maintain a copy.
>>>>>
>>>>> It mmaps the page cache pages into its own address space.
>>>>
>>>>
>>>>    According to your explanation, the flow is physical file(on disk)-->Page
>>>> Cache(on memory, but in kernel space)-->Process Memory(on  memory, but in
>>>> user space). Is it? I am not sure....
>>>>>
>>>
>>> Yes. When kernel find no page mapping, it allocate new page in page
>>> cache and copy from on-disk page to new page, then map the new page to
>>> user space address.
>>> so, Never duplication.
>>>
>>>>>> so...does there exists duplicated buffering?   (one in kernel -
>>>>>> pagecache, and one in userspace - for mmap() content of the file in
>>>>>> process memory)
>>>>>
>>>>> No, there is no such double buffering.
>>>>
>>>>    But what is the difference? Why linux do it?
>>>
>>> In case of read system call except O_DIRECT, It's duplication between
>>> user buffer and page cache.
>>> Read/write system call abstract page to file. so you always need user buffer.
>>> Let think. If you want read some data, First of all you need some
>>> space which is user buffer.
>>> Mmap system call abstract page to memory.  so you can handle file as
>>> memory operation without user buffer that mean It don't have
>>> duplication overhead.
>>
>> I see.   So u are saying that read() will duplicate buffer between
>> userspace (user buffer) and kernel (pagecache), but for mmap()
>> operation, since there is no duplication, all access from user process
>> will immediate trigger a context switch, to read in the data from
>> ring0, right?
>
> yeb. More exactly, to read in the data from file fault in page fault handler.
>
>> Since u normally read in blocks of data using read(), and access data
>> byte-wise with mmap()'s pointer, so read() is much more efficient, as
>> it trigger much lesser context switches than mmap()'s way of pointer
>> accessing kernel memory?
>>
>> Performance-wise, mmap() will perform worst off than read()?
>
> I am not sure which case is better.
> That's because read operation have a overhead memory copy from kernel to user.
> But it occur less page fault than mmap. but on-demand readahead or
> hint function can increase performance in case of sequential read
>

read() vs mmap().....mmap() takes a page fault with every single byte
of memory access, if the mmap() ptr is byte ptr, then it seemingly
should be slower than read()....

and a possible similar experience is here:

http://lkml.org/lkml/2008/1/14/517

> Page fault VS memory copy ?? which is more overhead ??
> I think Rik can answer this question.
>

-- 
Regards,
Peter Teoh

Ernest Hemingway - "Never mistake motion for action."

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ