On 10/30/20 9:51 AM, Naohiro Aota wrote:
For a zone append write, the device decides the location the data is
written to. Therefore we cannot ensure that two bios are written
consecutively on the device. In order to ensure that a ordered extent maps
to a contiguous region on disk, we need to maintain a "one bio == one
ordered extent" rule.
This commit implements the splitting of an ordered extent and extent map
on bio submission to adhere to the rule.
Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>
---
fs/btrfs/inode.c | 89 +++++++++++++++++++++++++++++++++++++++++
fs/btrfs/ordered-data.c | 76 +++++++++++++++++++++++++++++++++++
fs/btrfs/ordered-data.h | 2 +
3 files changed, 167 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 591ca539e444..6b2569dfc3bd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2158,6 +2158,86 @@ static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio *bio,
return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
}
+int extract_ordered_extent(struct inode *inode, struct bio *bio,
+ loff_t file_offset)
+{
+ struct btrfs_ordered_extent *ordered;
+ struct extent_map *em = NULL, *em_new = NULL;
+ struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+ u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT;
+ u64 len = bio->bi_iter.bi_size;
+ u64 end = start + len;
+ u64 ordered_end;
+ u64 pre, post;
+ int ret = 0;
+
+ ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset);
+ if (WARN_ON_ONCE(!ordered))
+ return -EIO;
+
+ /* no need to split */
+ if (ordered->disk_num_bytes == len)
+ goto out;
+
+ /* cannot split once end_bio'd ordered extent */
+ if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* we cannot split compressed ordered extent */
+ if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* cannot split waietd ordered extent */
+ if (WARN_ON_ONCE(wq_has_sleeper(&ordered->wait))) {
+ ret = -EINVAL;
+ goto out;
+ }
This is bad, we could choose any moment to wait on an ordered extent, and then
this will break.
In fact I'm not a fan of any of this code. I assume we only know at
bio_add_zone_append_page time how much we'll be able to shove into a bio? Then
I think the best/cleanest approach here is going to be to add something like
what compressed does, an entire alternate way to allocate and submit extents.
It would look something like
->lock pages
->reserve space
loop until all pages are submitted
->build bio
->add ordered extent for the bio
->unlock pages
Then the ordered extents are their correct size and you don't have to worry
about arbitrary waiters on ordered extents screwing things up, and you don't
have to split ordered extents after the fact. Thanks,
Josef