On Sun, Feb 01, 2004 at 15:52:01 -0800, Carl Spalletta wrote: > In a recent O'Reilly Press book "Java NIO" by R. Hitchens, pages 9-13 deal with generic Unix > file I/O; although the argument is garbled it is implied that _all_ file I/O type reads are > accomplished though demand paging generated by the pagefault handler. > > I am pretty sure that this is not the case in linux and have written the following outline > to explore this. > > Caveat: we assume that memory pagesize and the fs block size are both 4K, and also that no > system error occurs and that errno remains zero throughout. > > All regular file I/O in linux 2.6.1 takes one of two alternatives: through the read/write family > of syscalls or through direct memory operations in userspace on mmapped files. Both methods > utilize the page cache. > > FIRST ALTERNATIVE: read() syscall > > The system receives a file descriptor, an offset, a count and a userspace buffer address. When > the syscall returns it has copied from 0 <= n <= count bytes to the buffer. How many bytes > are copied is dependent on the state of the pagecache, together with the blocking/nonblocking mode > of the file. > > To start with the syscall examines the page cache to see if any of the requested pages are there > So, if the read offset was 10,000 and the count 20,000 then the system tries to find the pages > containing fs blocks 2 thru 7 - bytes 8192 through 31767. The amount of data found in the page > cache interacts with the blocking/nonblocking mode of the file as follows: > > Blocking read: > No pages found: queue I/O and sleep on queue. > Less than all pages found: > Lock found pages in memory. > Queue I/O for remaining pages and sleep on queue. > All pages found: > Copy the request from the found pages to the buffer. > Return the count. > > Nonblocking read: > No pages found: return 0: > Less than all pages found: > The found pages contain some initial portion of the request: > Copy that initial portion from the found pages to the userspace buffer. > Return the number of bytes copied. > No initial portion found: return 0 > All pages found: > Copy the request from the found pages to the buffer. > Return the count. There is no non-blocking read from disk! There is only an aio_read, which is a different syscall. > Notwithstanding Hitchen's claim, there is nothing in the above that has to do with pagefaults > except in case the pagetable(s) for the userspace buffer are marked 'not present' . Moreover no > pagefaults can occur in kernel space except on kernel page allocations (..???...) There are no page-fault per se in kernel. Pagefault happens when a page is accessed for which there is no entry in page-table -- and that simply never happens in kernel. However, the mechanizm for loading pages is the same for read as for page-fault. > SECOND ALTERNATIVE: operations on mmapped non-anonymous memory > > This is in some ways the opposite of the above. The syscall takes place entirely in kernel > space while the memory operations in this alternative are nominally entirely in user space; > moreover, the syscall method may have to deal with multiple pages but the memory mapped method > deals with only a single page at a time. > > Page is present: do memory ops in user space. > Page is not present: pagefault handler utilizes fs 'readpage' method. Resume in user space. > > So demand paging does indeed take place in the case of a mapped memory address with a page > marked 'not present' - but not otherwise and most emphatically not in _every_ case of filesystem > I/O. ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/