On Mon, Aug 31, 2009 at 08:50:25AM -0400, Theodore Tso wrote: > On Mon, Aug 31, 2009 at 06:03:14PM +0530, Aneesh Kumar K.V wrote: > > If the database is not being updated via a write(2), then even though > > the blocks are already allocated, we won't find buffer_heads attached to the page. > > > > ie, page_buffers(page) will be NULL > > > > The page_mkwrite -> write_begin path would be allocating the buffer_heads > > and attaching them to the page. So even in the above case we will be > > doing write_begin -> write_end. That is, it is similar to the (a) i wrote > > above. > > What about the case where they are being updated via llseek(2) and > write(2)? I'll grant that isn't as common these days (dbm used to do > it, but these days most people use berk_db, which does use mmap), but > it's not a totally unknown thing to do. Certainly any of the > e2fsprogs tools operating on a filesystem-image-in-a-file (which isn't > that uncommon if you are using KVM or some other virtualization > situation) uses llseek(2) and write(2). I'd have to check to see > whether KVM/qemu is using mmap(2) or write(2). > > If we think when we update-in-place already allocated blocks, it's > much more common to use mmap(2) than lseek(2)/write(2), then I can see > how avoiding taking a page_lock in ext4_page_mkwrite() might be the > right choice. On the other hand, if write(2) is more common, we'll be > starting and stopping a transaction handle, and going through a *much* > more complicated code path. > > The other question I have then is that there are multiple > write_begin/write_end functions that could be used, if we are going to > be dropping this check in ext4_page_mkwrite() and depending in > write_begin/write_end to do the right thing. (ext4_write_begin, > ext4_da_write_begin, ext4_ordered_write_end, > ext4_journalled_write_end, ext4_writeback_write_end). You did check > all of the possible code path combinations, to make sure they will do > the right thing? Both ext4_write_begin and ext4_da_write_begin use block_write_begin which calls __block_prepare_write which looks at the mapped flag of the buffer_head and call get_block if not mapped. Delayed alloc get_block does block reservation and returns a mapped buffer_head and non delayed alloc get_block does block allocation and returns a mapped buffer_head. So in both the case i guess we are ok -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html