[PATCH 0/5] [RFC] Fix page_mkwrite for blocksize < pagesize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hi,

  below is a patch series that is my new approach to solve problems with
page_mkwrite() when blocksize < pagesize. To refresh memory the main issue is
as follows:

We'd like to use page_mkwrite() to allocate blocks under a page which is
becoming writeably mmapped in some process address space. This allows a
filesystem to return a page fault if there is not enough space available, user
exceeds quota or similar problem happens, rather than silently discarding data
later when writepage is called.

On filesystems where blocksize < pagesize the situation is complicated though.
Think for example that blocksize = 1024, pagesize = 4096 and a process does:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';  ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  fsync(fd); ----> writepage() for index 0 is called

At the moment page_mkwrite() is called, filesystem can allocate only one block
for the page because i_size == 1024. Otherwise it would create blocks beyond
i_size which is generally undesirable. But later at writepage() time, we would
like to have blocks allocated for the whole page (and in principle we have to
allocate them because user could have filled the page with data after the
second ftruncate()).

This series is an attempt to fix the above issue. The idea is that we do i_size
update after an extending write or truncate not under the page lock of the page
where the i_size ends up but under the page lock of the page where i_size was
originally. This also allows us to solve a posix compliance issue where we
could have exposed data written via mmap beyond i_size.

I see two disputable things with this approach:
1) set_page_dirty_buffers() and create_empty_buffers() now checks i_size.
That's a bit ugly although not marking buffers dirty beyond i_size makes a lot
of sence to me.

2) to fix the problem with non-zeros written via mmap beyond EOF and then
being exposed by truncate, I've added zeroing to a function doing all the work
when extending i_size (which is essentially the only place where we can reliably
do the work and avoid races with mmap). That's a good fit but basically all
filesystems now have to extend i_size with this function which I don't find that
pleasing. Anybody has an idea how to avoid that conversion of every filesystem
or make it less painful?

Thanks for comments in advance.
									Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux