On Fri, 24 Oct 2014 17:20:38 -0400 Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx> wrote: > Use the generic AIO infrastructure instead of custom read and write > methods. In addition to giving us support for AIO, this adds the missing > locking between read() and truncate(). > > ... > > +/* > + * When ext4 encounters a hole, it returns without modifying the buffer_head > + * which means that we can't trust b_size. To cope with this, we set b_state > + * to 0 before calling get_block and, if any bit is set, we know we can trust > + * b_size. Unfortunate, really, since ext4 knows precisely how long a hole is > + * and would save us time calling get_block repeatedly. > + */ > +static bool buffer_size_valid(struct buffer_head *bh) > +{ > + return bh->b_state != 0; > +} Yitch. Is there a cleaner way of doing this? > +static ssize_t dax_io(int rw, struct inode *inode, struct iov_iter *iter, > + loff_t start, loff_t end, get_block_t get_block, > + struct buffer_head *bh) hm, some documentation would be nice. I expected "dax_io" to do IO, but this doesn't. Is it well named? > +{ > + ssize_t retval = 0; > + loff_t pos = start; > + loff_t max = start; > + loff_t bh_max = start; > + void *addr; > + bool hole = false; > + > + if (rw != WRITE) > + end = min(end, i_size_read(inode)); > + > + while (pos < end) { > + unsigned len; > + if (pos == max) { > + unsigned blkbits = inode->i_blkbits; > + sector_t block = pos >> blkbits; > + unsigned first = pos - (block << blkbits); > + long size; > + > + if (pos == bh_max) { > + bh->b_size = PAGE_ALIGN(end - pos); > + bh->b_state = 0; > + retval = get_block(inode, block, bh, > + rw == WRITE); > + if (retval) > + break; > + if (!buffer_size_valid(bh)) > + bh->b_size = 1 << blkbits; > + bh_max = pos - first + bh->b_size; > + } else { > + unsigned done = bh->b_size - > + (bh_max - (pos - first)); > + bh->b_blocknr += done >> blkbits; > + bh->b_size -= done; > + } > + > + hole = (rw != WRITE) && !buffer_written(bh); > + if (hole) { > + addr = NULL; > + size = bh->b_size - first; > + } else { > + retval = dax_get_addr(bh, &addr, blkbits); > + if (retval < 0) > + break; > + if (buffer_unwritten(bh) || buffer_new(bh)) > + dax_new_buf(addr, retval, first, pos, > + end); > + addr += first; > + size = retval - first; > + } > + max = min(pos + size, end); > + } > + > + if (rw == WRITE) > + len = copy_from_iter(addr, max - pos, iter); > + else if (!hole) > + len = copy_to_iter(addr, max - pos, iter); > + else > + len = iov_iter_zero(max - pos, iter); > + > + if (!len) > + break; > + > + pos += len; > + addr += len; > + } > + > + return (pos == start) ? retval : pos - start; > +} > + > +/** > + * dax_do_io - Perform I/O to a DAX file > + * @rw: READ to read or WRITE to write > + * @iocb: The control block for this I/O > + * @inode: The file which the I/O is directed at > + * @iter: The addresses to do I/O from or to > + * @pos: The file offset where the I/O starts > + * @get_block: The filesystem method used to translate file offsets to blocks > + * @end_io: A filesystem callback for I/O completion > + * @flags: See below > + * > + * This function uses the same locking scheme as do_blockdev_direct_IO: > + * If @flags has DIO_LOCKING set, we assume that the i_mutex is held by the > + * caller for writes. For reads, we take and release the i_mutex ourselves. > + * If DIO_LOCKING is not set, the filesystem takes care of its own locking. > + * As with do_blockdev_direct_IO(), we increment i_dio_count while the I/O > + * is in progress. It would be helpful here to explain *why* this code uses i_dio_count: what is trying to protect (against)? Oh, is that how it works ;) Perhaps a few BUG_ON(!mutex_is_locked(&inode->i_mutex)) would clarfiy and prevent mistakes. > + */ > +ssize_t dax_do_io(int rw, struct kiocb *iocb, struct inode *inode, > + struct iov_iter *iter, loff_t pos, > + get_block_t get_block, dio_iodone_t end_io, int flags) > > ... > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html