On Fri, Nov 30, 2012 at 11:36:01AM -0500, Christoph Hellwig wrote: > On Fri, Nov 30, 2012 at 01:49:10PM +1100, Dave Chinner wrote: > > > Ugh. That's a big violation of how buffer-heads are supposed to work: > > > the block number is very much defined to be in multiples of b_size > > > (see for example "submit_bh()" that turns it into a sector number). > > > > > > But you're right. The direct-IO code really *is* violating that, and > > > knows that get_block() ends up being defined in i_blkbits regardless > > > of b_size. > > > > Same with mpage_readpages(), so it's not just direct IO that has > > this problem.... > > The mpage code may actually fall back to BHs. > > I have a version of the direct I/O code that uses the iomap_ops from the > multi-page write code that you originally started. It uses the new op > as primary interface for direct I/O and provides a helper for > filesystems that still use buffer heads internally. I'll try to dust it > off and send out a version for the current kernel. So it was based on this interface? (I went looking for this code on google a couple of days ago so I could point at it and say "we should be using an iomap structure, not buffer heads", but it looks like I never posted it to fsdevel or the xfs lists...) diff --git a/include/linux/fs.h b/include/linux/fs.h index 090f0ea..e247d62 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -522,6 +522,7 @@ enum positive_aop_returns { struct page; struct address_space; struct writeback_control; +struct iomap; struct iov_iter { const struct iovec *iov; @@ -614,6 +615,9 @@ struct address_space_operations { int (*is_partially_uptodate) (struct page *, read_descriptor_t *, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); + + int (*iomap)(struct address_space *mapping, loff_t pos, + ssize_t length, struct iomap *iomap, int cmd); }; /* diff --git a/include/linux/iomap.h b/include/linux/iomap.h new file mode 100644 index 0000000..7708614 --- /dev/null +++ b/include/linux/iomap.h @@ -0,0 +1,45 @@ +#ifndef _IOMAP_H +#define _IOMAP_H + +/* ->iomap a_op command types */ +#define IOMAP_READ 0x01 /* read the current mapping starting at the + given position, trimmed to a maximum length. + FS's should use this to obtain and lock + resources within this range */ +#define IOMAP_RESERVE 0x02 /* reserve space for an allocation that spans + the given iomap */ +#define IOMAP_ALLOCATE 0x03 /* allocate space in a given iomap - must have + first been reserved */ +#define IOMAP_UNRESERVE 0x04 /* return unused reserved space for the given + iomap and used space. This will always be + called after a IOMAP_READ so as to allow the + FS to release held resources. */ + +/* types of block ranges for multipage write mappings. */ +#define IOMAP_HOLE 0x01 /* no blocks allocated, need allocation */ +#define IOMAP_DELALLOC 0x02 /* delayed allocation blocks */ +#define IOMAP_MAPPED 0x03 /* blocks allocated @blkno */ +#define IOMAP_UNWRITTEN 0x04 /* blocks allocated @blkno in unwritten state */ + +struct iomap { + sector_t blkno; /* first sector of mapping */ + loff_t offset; /* file offset of mapping, bytes */ + ssize_t length; /* length of mapping, bytes */ + int type; /* type of mapping */ + void *priv; /* fs private data associated with map */ +}; + +static inline bool +iomap_needs_allocation(struct iomap *iomap) +{ + return iomap->type == IOMAP_HOLE; +} + +/* multipage write interfaces use iomaps */ +typedef int (*mpw_actor_t)(struct address_space *mapping, void *src, + loff_t pos, ssize_t len, struct iomap *iomap); + +ssize_t multipage_write_segment(struct address_space *mapping, void *src, + loff_t pos, ssize_t length, mpw_actor_t actor); + +#endif /* _IOMAP_H */ Cheers, Dave. > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html