Placeholders can span a range bigger than one page. The
placeholder is
inserted into the radix slot for the end of the range, and the
flags field in
the page struct is used to record the start of the range.
Not surprisingly, I like how this makes the interaction between
buffered and O_DIRECT a lot less magical. Nice work.
Some comments:
+void wait_on_placeholder_pages_range(struct address_space *mapping,
+ pgoff_t start, pgoff_t end)
...
+ while(start <= end) {
...
+ start = pages[i]->index + 1;
+ if (pages[i]->index > end)
+ goto done;
It looks like this might be confused by a end and final page->index
of ~0? Maybe it'd be safer to explicitly deal with this here instead
of relying on safe arguments.
+ }
+ if (need_resched()) {
+ read_unlock_irq(&mapping->tree_lock);
+ cond_resched();
+ read_lock_irq(&mapping->tree_lock);
+ }
We should introduce a cond_sched_lock_irq() to match cond_resched_lock
().
+void remove_placeholder_pages(struct address_space *mapping,
+ unsigned long start,
+ unsigned long end)
+{
+ struct page *page;
+ int ret;
+ int i;
+ unsigned long num;
+ struct page *pages[8];
+
+ write_lock_irq(&mapping->tree_lock);
+ while (start < end) {
...
+ write_unlock_irq(&mapping->tree_lock);
And call that cond_resched_lock_irq() in here too? Since we grabbing
the lock in here, presumably the caller won't mind if we grab and
release it a few times?
+static int insert_placeholder(struct address_space *mapping,
+ struct page *insert)
+{
spin_lock_assert(&mapping->tree_lock)?
+int find_or_insert_placeholders(struct address_space *mapping,
+ unsigned long start, unsigned long end,
+ gfp_t gfp_mask, int iowait)
+{
...
+ /*
+ * this gets complicated.
It sure does! Sprinkling a few comments above the main blocks in the
loop might be nice.
+ err = filemap_write_and_wait_range(mapping,
+ cur << PAGE_CACHE_SHIFT,
+ end << PAGE_CACHE_SHIFT);
This end is exclusive (>= end breaks) but filemap_write_and_wait_range
() -> wait_on_page_writeback_range() wants an argument that is
inclusive (> end breaks)?
+ if (PagePlaceHolder(page)) {
+ DEFINE_WAIT(wait);
+ wait_queue_head_t *wqh = page_waitqueue(page);
+ prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+ read_unlock_irq(&mapping->tree_lock);
+ io_schedule();
+ finish_wait(wqh, &wait);
+ read_lock_irq(&mapping->tree_lock);
This pattern seems to be repeated quite a few times. Maybe a helper
would.. help? Maybe it's not worth it if it has to do silly little
pre-schedule tasks differently at each site.
- for (i = 0; i < ret; i++)
- page_cache_get(pages[i]);
+ while(i < ret) {
+ if (PagePlaceHolder(pages[i])) {
+ /* we can't return a place holder, shift it away */
+ if (i + 1 < ret) {
+ memcpy(&pages[i], &pages[i+1],
+ (ret - i - 1) * sizeof(struct page *));
+ }
+ ret--;
+ continue;
+ } else
+ page_cache_get(pages[i]);
+ i++;
It looks like this loop is implemented twice? How about a factoring
it out? Especially with the risk of off-by-ones in there. (Though I
don't see problems, it looks correct.)
More on the rest of the series after lunch :).
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html