On Wed, Nov 01, 2006 at 11:08:04AM -0400, Chris Mason wrote: > All mutex and semaphore usage is removed from fs/direct-io.c. Callers > can ask for placeholder pages if they want help protecting against > races with buffered io. > > Signed-off-by: Chris Mason <chris.mason@xxxxxxxxxx> ..... > +static void unlock_page_range(struct dio *dio, unsigned long start, > + unsigned long nr) > +{ > + if (dio->lock_type != DIO_NO_LOCKING) { > + remove_placeholder_pages(dio->inode->i_mapping, dio->tmppages, > + &dio->fake, > + start, start + nr, > + ARRAY_SIZE(dio->tmppages)); > + } > +} > + > +static int lock_page_range(struct dio *dio, unsigned long start, > + unsigned long nr) > +{ > + struct address_space *mapping = dio->inode->i_mapping; > + struct page *fake = &dio->fake; > + unsigned long end = start + nr; > + > + if (dio->lock_type == DIO_NO_LOCKING) > + return 0; > + return find_or_insert_placeholders(mapping, dio->tmppages, start, end, > + ARRAY_SIZE(dio->tmppages), > + GFP_KERNEL, fake, > + dio->rw == READ); > +} I'm not sure that these two function are well named - lock_page_range() could be confused for an extenstion of lock_page() and they are very different things. Something like dio_lock_range() and dio_unlock_range() might be more appropriate... > + > + > /* > * Get another userspace page. Returns an ERR_PTR on error. Pages are > * buffered inside the dio so that we can call get_user_pages() against a > @@ -219,9 +255,9 @@ static void dio_complete(struct dio *dio > { > if (dio->end_io && dio->result) > dio->end_io(dio->iocb, offset, bytes, dio->map_bh.b_private); > - if (dio->lock_type == DIO_LOCKING) > - /* lockdep: non-owner release */ > - up_read_non_owner(&dio->inode->i_alloc_sem); > + unlock_page_range(dio, dio->fspages_start_off, > + dio->fspages_end_off - dio->fspages_start_off); > + dio->fspages_end_off = dio->fspages_start_off; > } dio->fspages_end_off and dio->fspages_start_off only seem to have relevance to the DIO_PLACEHOLDER case - can this be wrapped up inside dio_unlock_range()? > > /* > @@ -517,6 +553,7 @@ static int get_more_blocks(struct dio *d > unsigned long fs_count; /* Number of filesystem-sized blocks */ > unsigned long dio_count;/* Number of dio_block-sized blocks */ > unsigned long blkmask; > + unsigned long index; > int create; > > /* > @@ -544,7 +581,19 @@ static int get_more_blocks(struct dio *d > } else if (dio->lock_type == DIO_NO_LOCKING) { > create = 0; > } > - > + index = fs_startblk >> (PAGE_CACHE_SHIFT - > + dio->inode->i_blkbits); > + end = (dio->final_block_in_request >> dio->blkfactor) >> > + (PAGE_CACHE_SHIFT - dio->inode->i_blkbits); > + BUG_ON(index > end); > + while (index >= dio->fspages_end_off) { > + unsigned long nr; > + nr = min(end - index + 1, (unsigned long)DIO_PAGES); > + ret = lock_page_range(dio, dio->fspages_end_off, nr); > + if (ret) > + goto error; > + dio->fspages_end_off += nr; > + } Ditto for this - if we are not using placeholders, then none of this is needed and so could be wrapped in: if (dio->flags & DIO_PLACEHOLDER) { ret = dio_lock_range(dio, ...) if (ret) goto error; } > @@ -1221,49 +1257,29 @@ __blockdev_direct_IO(int rw, struct kioc > goto out; > } > } > - > dio = kmalloc(sizeof(*dio), GFP_KERNEL); > retval = -ENOMEM; > if (!dio) > goto out; > > + set_page_placeholder(&dio->fake); > + dio->fspages_start_off = offset >> PAGE_CACHE_SHIFT; > + dio->fspages_end_off = dio->fspages_start_off; > + if (flags & DIO_PLACEHOLDER) ? > /* > * For block device access DIO_NO_LOCKING is used, > * neither readers nor writers do any locking at all > * For regular files using DIO_LOCKING, > - * readers need to grab i_mutex and i_alloc_sem > - * writers need to grab i_alloc_sem only (i_mutex is already held) > + * No locks are taken > * For regular files using DIO_OWN_LOCKING, > * neither readers nor writers take any locks here > */ So nothing does any locking here - comment can be simplified/killed? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html