[PATCH 3/3] dm-zoned: New device mapper target for zoned block devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Damien Le Moal <damien.lemoal@xxxxxxxx>

dm-zoned presents a regular block device fully randomly writeable,
hiding write constraints of host-managed zoned block devices and
mitigating potential performance degradation of host-aware devices.

Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxxx>
Signed-off-by: Hannes Reinecke <hare@xxxxxxxx>
---
 Documentation/device-mapper/dm-zoned.txt |  147 +++
 drivers/md/Kconfig                       |   14 +
 drivers/md/Makefile                      |    2 +
 drivers/md/dm-zoned-io.c                 | 1186 ++++++++++++++++++
 drivers/md/dm-zoned-meta.c               | 1950 ++++++++++++++++++++++++++++++
 drivers/md/dm-zoned-reclaim.c            |  770 ++++++++++++
 drivers/md/dm-zoned.h                    |  687 +++++++++++
 7 files changed, 4756 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-zoned.txt
 create mode 100644 drivers/md/dm-zoned-io.c
 create mode 100644 drivers/md/dm-zoned-meta.c
 create mode 100644 drivers/md/dm-zoned-reclaim.c
 create mode 100644 drivers/md/dm-zoned.h

diff --git a/Documentation/device-mapper/dm-zoned.txt b/Documentation/device-mapper/dm-zoned.txt
new file mode 100644
index 0000000..28595ef
--- /dev/null
+++ b/Documentation/device-mapper/dm-zoned.txt
@@ -0,0 +1,147 @@
+dm-zoned
+========
+
+The dm-zoned device mapper target provides transparent write access to
+zoned block devices (ZBC and ZAC compliant devices). It hides to the
+device user (a file system or an application doing raw block device
+accesses) any sequential write constraint on host-managed devices and
+can mitigate potential device performance degradation with host-aware
+zoned devices.
+
+For a more detailed description of the zoned block device models and
+constraints see (for SCSI devices):
+
+http://www.t10.org/drafts.htm#ZBC_Family
+
+And (for ATA devices):
+
+http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
+
+
+Algorithm
+=========
+
+The zones of the device are separated into 3 sets:
+1) Metadata zones: these are randomly writeable zones used to store metadata.
+Randomly writeable zones may be conventional zones or sequential write preferred
+zones (host-aware devices only). These zones have a fixed mapping and must be
+available at the beginning of the device address space (from LBA 0).
+2) Buffer zones: these are randomly writeable zones used to temporarily
+buffer unaligned writes to data zones. Buffer zones zones may be conventional
+zones or sequential write preferred zones (host-aware devivces only) and any
+random zone in the device address space can be used as a buffer zone (there
+are no constraint on these zones location).
+3) Data zones: all remaining zones. Most will likely be sequential zones,
+either sequiential write required zones (host-managed devices) or sequential
+write preferred zones (host-aware devices). Conventional zones unused as
+metadata zone or buffer zone will be part of the set of data zones. dm-zoned
+tries to efficiently allocate and map these zones to limit the performance
+impact of buffering random writes for chunks of the logical device that are
+being heavily randomly written.
+
+dm-zoned exposes a logical device with a sector size of 4096 bytes, irespectively
+of the physical sector size of the backend device being used.  This allows
+reducing the amount of metadata needed to manage valid blocks (blocks written)
+and buffering of random writes. In more detail, the on-disk metadata format
+is as follows:
+1) Block 0 contains the super block which describes the amount of metadata
+blocks used, the number of buffer zones reserved, their position on disk and
+the data zones being buffered.
+2) Following block 0, a set of blocks is used to describe the mapping to data
+zones of the logical chunks of the logical device (the size of a logical chunk
+is equal to the device zone size).
+3) A set of blocks used to store bitmaps indicating the validity of blocks in
+the buffer zones and data zones. A valid block is a block that was writen and
+not discarded. For a buffered data zone, a block can be valid only in the data
+zone or in the buffer zone.
+
+For a logical chunk mapped to a conventional data zone, all write operations are
+processed by directly writing the data zone. If the mapping zone is a sequential
+zone, the write operation is processed directly only and only if the write offset
+within the logical chunk equals the write pointer offset within the data zone
+(i.e. the write operation is aligned on the zone write pointer).
+
+Otherwise, write operations are processed indirectly using a buffer zone: a buffer
+zone is allocated and assigned to the data zone being accessed and data written to
+the buffer zone. This results in the invalidation of the written block in the data
+zone and validation in the buffer zone.
+
+Read operations are processed according to the block validity information provided
+by the bitmaps: valid blocks are read either from the data zone or, if the data
+zone is buffered, from the buffer zone assigned to the data zone.
+
+After some time, the limited number of buffer zones available may be exhausted and
+unaligned writes to unbuffered zones become impossible. To avoid such situation, a
+reclaim process regularly scan used buffer zones and try to "reclaim" them by
+rewriting (sequentially) the buffered data blocks and the valid blocks in the data
+zone being buffered into a new data zone. This "merge" operation completes with the
+remapping of the data zone chunk to the newly writen data zone and the release of
+the buffer zone.
+
+This reclaim process is optimized to try to detect data zones that are being
+heavily randomly written and try to do the merge operation into a conventional
+data zone (if available).
+
+Usage
+=====
+
+Parameters: <zoned device path> [Options]
+Options:
+	debug             : Enable debug messages
+	format            : Reset and format the device metadata. This will
+	                    invalidate all blocks of the device and trigger
+			    a reset write pointer of all zones, causing the
+			    loss of all previously written data.
+	num_bzones=<num>  : If the format option is specified, change the
+			    default number of buffer zones from 64 to <num>.
+			    If <num> is too large and cannot be accomodated
+			    with the number of available random zones, the
+			    maximum possible number of buffer zones is used.
+	align_wp=<blocks> : Use write same command to move an SMR zone write
+			    pointer position to the offset of a write request,
+			    limiting the write same operation to at most
+			    <blocks>. This can reduce the use of buffer zones,
+			    but can also significantly decrease the disk
+			    useable throughput. Set to 0 (default) to disable
+			    this feature. The maximum allowed is half the
+			    disk zone size.
+
+Example scripts
+===============
+
+[[
+#!/bin/sh
+
+if [ $# -lt 1 ]; then
+	echo "Usage: $0 <Zoned device path> [Options]"
+	echo "Options:"
+	echo "    debug             : Enable debug messages"
+	echo "    format            : Reset and format the device metadata. This will"
+	echo "                        invalidate all blocks of the device and trigger"
+	echo "                        a reset write pointer of all zones, causing the"
+	echo "                        loss of all previously written data."
+	echo "    num_bzones=<num>  : If the format option is specified, change the"
+	echo "                        default number of buffer zones from 64 to <num>."
+	echo "                        If <num> is too large and cannot be accomodated"
+	echo "                        with the number of available random zones, the"
+	echo "                        maximum possible number of buffer zones is used."
+	echo "    align_wp=<blocks> : Use write same command to move an SMR zone write"
+	echo "                        pointer position to the offset of a write request,"
+	echo "		              limiting the write same operation to at most"
+	echo "		              <blocks>. This can reduce the use of buffer zones,"
+	echo "                        but can also significantly decrease the disk"
+	echo "                        useable throughput. Set to 0 (default) to disable"
+	echo "                        this feature. The maximum allowed is half the"
+	echo "                        disk zone size."
+	exit 1
+fi
+
+dev="${1}"
+shift
+options="$@"
+
+modprobe dm-zoned
+
+echo "0 `blockdev --getsize ${dev}` dm-zoned ${dev} ${options}" | dmsetup create zoned-`basename ${dev}`
+]]
+
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 02a5345..4f31863 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -500,4 +500,18 @@ config DM_LOG_WRITES
 
 	  If unsure, say N.
 
+config DM_ZONED
+	tristate "Zoned block device cache write target support (EXPERIMENTAL)"
+	depends on BLK_DEV_DM && BLK_DEV_ZONED
+	default n
+	---help---
+	  This device-mapper target implements an on-disk caching layer for
+	  zoned block devices (ZBC), doing so hiding random write constraints
+	  of the backend device.
+
+	  To compile this code as a module, choose M here: the module will
+	  be called dm-zoned.
+
+	  If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 52ba8dd..2d61be5 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -18,6 +18,7 @@ dm-era-y	+= dm-era-target.o
 dm-verity-y	+= dm-verity-target.o
 md-mod-y	+= md.o bitmap.o
 raid456-y	+= raid5.o raid5-cache.o
+dm-zoned-y      += dm-zoned-io.o dm-zoned-meta.o dm-zoned-reclaim.o
 
 # Note: link order is important.  All raid personalities
 # and must come before md.o, as they each initialise 
@@ -58,6 +59,7 @@ obj-$(CONFIG_DM_CACHE_SMQ)	+= dm-cache-smq.o
 obj-$(CONFIG_DM_CACHE_CLEANER)	+= dm-cache-cleaner.o
 obj-$(CONFIG_DM_ERA)		+= dm-era.o
 obj-$(CONFIG_DM_LOG_WRITES)	+= dm-log-writes.o
+obj-$(CONFIG_DM_ZONED)          += dm-zoned.o
 
 ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs			+= dm-uevent.o
diff --git a/drivers/md/dm-zoned-io.c b/drivers/md/dm-zoned-io.c
new file mode 100644
index 0000000..347510a
--- /dev/null
+++ b/drivers/md/dm-zoned-io.c
@@ -0,0 +1,1186 @@
+/*
+ * (C) Copyright 2016 Western Digital.
+ *
+ * This software is distributed under the terms of the GNU Lesser General
+ * Public License version 2, or any later version, "as is," without technical
+ * support, and WITHOUT ANY WARRANTY, without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * Author: Damien Le Moal <damien.lemoal@xxxxxxxx>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/version.h>
+
+#include "dm-zoned.h"
+
+/**
+ * Target BIO completion.
+ */
+static inline void
+dm_zoned_bio_end(struct bio *bio, int err)
+{
+	struct dm_zoned_bioctx *bioctx
+		= dm_per_bio_data(bio, sizeof(struct dm_zoned_bioctx));
+
+	if (err)
+		bioctx->error = err;
+
+	if (atomic_dec_and_test(&bioctx->ref)) {
+		/* I/O Completed */
+		if (bioctx->dzone)
+			dm_zoned_put_dzone(bioctx->target, bioctx->dzone);
+		bio->bi_error = bioctx->error;
+		bio_endio(bio);
+	}
+}
+
+/**
+ * I/O request completion callback. This terminates
+ * the target BIO when there are no more references
+ * on the BIO context.
+ */
+static void
+dm_zoned_bio_end_io(struct bio *bio)
+{
+	struct dm_zoned_bioctx *bioctx = bio->bi_private;
+	struct dm_zoned_zone *dzone = bioctx->dzone;
+	int err = bio->bi_error;
+	unsigned long flags;
+
+	dm_zoned_lock_zone(dzone, flags);
+	dm_zoned_assert(dzone->zwork);
+	if (atomic_dec_and_test(&dzone->zwork->bio_count)) {
+		clear_bit_unlock(DM_ZONE_ACTIVE_BIO, &dzone->flags);
+		smp_mb__after_atomic();
+		wake_up_bit(&dzone->flags, DM_ZONE_ACTIVE_BIO);
+	}
+	dm_zoned_unlock_zone(dzone, flags);
+
+	dm_zoned_bio_end(bioctx->bio, err);
+
+	bio_put(bio);
+
+}
+
+/**
+ * Issue a request to process a BIO.
+ * Processing of the BIO may be partial.
+ */
+static int
+dm_zoned_submit_zone_bio(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 struct bio *dzt_bio,
+			 sector_t chunk_block,
+			 unsigned int nr_blocks)
+{
+	struct dm_zoned_bioctx *bioctx
+		= dm_per_bio_data(dzt_bio, sizeof(struct dm_zoned_bioctx));
+	unsigned int nr_sectors = dm_zoned_block_to_sector(nr_blocks);
+	unsigned int size = nr_sectors << SECTOR_SHIFT;
+	struct dm_zoned_zone *dzone = bioctx->dzone;
+	unsigned long flags;
+	struct bio *clone;
+
+	dm_zoned_dev_assert(dzt, size != 0);
+	dm_zoned_dev_assert(dzt, size <= dzt_bio->bi_iter.bi_size);
+
+	clone = bio_clone_fast(dzt_bio, GFP_NOIO, dzt->bio_set);
+	if (!clone)
+		return -ENOMEM;
+
+	/* Setup the clone */
+	clone->bi_bdev = dzt->zbd;
+	clone->bi_rw = dzt_bio->bi_rw;
+	clone->bi_iter.bi_sector = dm_zoned_zone_start_sector(zone)
+		+ dm_zoned_block_to_sector(chunk_block);
+	clone->bi_iter.bi_size = size;
+	clone->bi_end_io = dm_zoned_bio_end_io;
+	clone->bi_private = bioctx;
+
+	bio_advance(dzt_bio, size);
+
+	/* Submit the clone */
+	dm_zoned_lock_zone(dzone, flags);
+	if (atomic_inc_return(&dzone->zwork->bio_count) == 1)
+		set_bit(DM_ZONE_ACTIVE_BIO, &dzone->flags);
+	atomic_inc(&bioctx->ref);
+	dm_zoned_unlock_zone(dzone, flags);
+	generic_make_request(clone);
+
+	return 0;
+}
+
+/**
+ * Zero out blocks of a read BIO buffers.
+ */
+static void
+dm_zoned_handle_read_zero(struct dm_zoned_target *dzt,
+			  struct dm_zoned_zone *zone,
+			  struct bio *bio,
+			  sector_t chunk_block,
+			  unsigned int nr_blocks)
+{
+	unsigned int size = nr_blocks << DM_ZONED_BLOCK_SHIFT;
+
+#ifdef __DM_ZONED_DEBUG
+	if (zone)
+		dm_zoned_dev_debug(dzt, "=> ZERO READ chunk %zu -> zone %lu, block %zu, %u blocks\n",
+				 dm_zoned_bio_chunk(dzt, bio),
+				 zone->id,
+				 chunk_block,
+				 nr_blocks);
+	else
+		dm_zoned_dev_debug(dzt, "=> ZERO READ unmapped chunk %zu, block %zu, %u blocks\n",
+				 dm_zoned_bio_chunk(dzt, bio),
+				 chunk_block,
+				 nr_blocks);
+#endif
+
+	dm_zoned_dev_assert(dzt, size != 0);
+	dm_zoned_dev_assert(dzt, size <= bio->bi_iter.bi_size);
+	dm_zoned_dev_assert(dzt, bio_data_dir(bio) == READ);
+
+	/* Clear nr_blocks */
+	swap(bio->bi_iter.bi_size, size);
+	zero_fill_bio(bio);
+	swap(bio->bi_iter.bi_size, size);
+
+	bio_advance(bio, size);
+}
+
+/**
+ * Issue a read request or zero out blocks buffers
+ * to process an entire or part of a read BIO.
+ */
+static int
+dm_zoned_handle_read_bio(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 struct bio *bio,
+			 sector_t chunk_block,
+			 unsigned int nr_blocks)
+{
+
+	dm_zoned_dev_debug(dzt, "=> %s READ zone %lu, block %zu, %u blocks\n",
+			 (dm_zoned_zone_buf(zone) ? "BUF" : "SMR"),
+			 zone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	if (!nr_blocks)
+		return -EIO;
+
+	/* Submit read */
+	return dm_zoned_submit_zone_bio(dzt, zone, bio, chunk_block, nr_blocks);
+}
+
+/**
+ * Process a read BIO.
+ */
+static int
+dm_zoned_handle_read(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *zone,
+		     struct bio *bio)
+{
+	struct dm_zoned_zone *bzone;
+	sector_t chunk_block = dm_zoned_bio_chunk_block(dzt, bio);
+	unsigned int nr_blocks = dm_zoned_bio_blocks(bio);
+	sector_t end_block = chunk_block + nr_blocks;
+	int ret = -EIO;
+
+	/* Read into unmapped chunks need only zeroing the BIO buffer */
+	if (!zone) {
+		dm_zoned_handle_read_zero(dzt, NULL, bio, chunk_block, nr_blocks);
+		return 0;
+	}
+
+	/* If this is an empty SMR zone that is also not */
+	/* buffered, all its blocks are invalid.         */
+	bzone = zone->bzone;
+	if (!bzone && dm_zoned_zone_is_smr(zone) && dm_zoned_zone_empty(zone)) {
+		dm_zoned_handle_read_zero(dzt, zone, bio, chunk_block, nr_blocks);
+		return 0;
+	}
+
+	/* Check block validity to determine the read location  */
+	while (chunk_block < end_block) {
+
+		if (dm_zoned_zone_is_cmr(zone)
+		    || chunk_block < zone->wp_block) {
+			/* Test block validity in the data zone */
+			ret = dm_zoned_block_valid(dzt, zone, chunk_block);
+			if (ret < 0)
+				return ret;
+			if (ret > 0) {
+				/* Read data zone blocks */
+				nr_blocks = min_t(unsigned int, ret,
+						  end_block - chunk_block);
+				ret = dm_zoned_handle_read_bio(dzt, zone, bio,
+							       chunk_block,
+							       nr_blocks);
+				if (ret < 0)
+					return ret;
+				chunk_block += nr_blocks;
+				continue;
+			}
+		}
+
+		/* Check the buffer zone, if there is one */
+		if (bzone) {
+			ret = dm_zoned_block_valid(dzt, bzone, chunk_block);
+			if (ret < 0)
+				return ret;
+			if (ret > 0) {
+				/* Read buffer zone blocks */
+				nr_blocks = min_t(unsigned int, ret,
+						  end_block - chunk_block);
+				ret = dm_zoned_handle_read_bio(dzt, bzone, bio,
+							       chunk_block,
+							       nr_blocks);
+				if (ret < 0)
+					return ret;
+				chunk_block += nr_blocks;
+				continue;
+			}
+		}
+
+		/* No valid block: zeroout the block in the BIO */
+		dm_zoned_handle_read_zero(dzt, zone, bio, chunk_block, 1);
+		chunk_block++;
+
+	}
+
+	return 0;
+}
+
+/**
+ * Write blocks in the buffer zone of @zone.
+ * If no buffer zone is assigned yet, get one.
+ * Called with @zone write locked.
+ */
+static int
+dm_zoned_handle_buffered_write(struct dm_zoned_target *dzt,
+			       struct dm_zoned_zone *zone,
+			       struct bio *bio,
+			       sector_t chunk_block,
+			       unsigned int nr_blocks)
+{
+	struct dm_zoned_zone *bzone;
+	int ret;
+
+	/* Make sure we have a buffer zone */
+	bzone = dm_zoned_alloc_bzone(dzt, zone);
+	if (!bzone)
+		return -EBUSY;
+
+	dm_zoned_dev_debug(dzt, "=> WRITE BUF zone %lu, block %zu, %u blocks\n",
+			 bzone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	/* Submit write */
+	ret = dm_zoned_submit_zone_bio(dzt, bzone, bio, chunk_block, nr_blocks);
+	if (ret)
+		return -EIO;
+
+	/* Stats */
+	zone->mtime = jiffies;
+	zone->wr_buf_blocks += nr_blocks;
+
+	/* Validate the blocks in the buffer zone */
+	/* and invalidate in the data zone.       */
+	ret = dm_zoned_validate_blocks(dzt, bzone, chunk_block, nr_blocks);
+	if (ret == 0 && chunk_block < zone->wp_block)
+		ret = dm_zoned_invalidate_blocks(dzt, zone,
+						 chunk_block, nr_blocks);
+
+	return ret;
+}
+
+/**
+ * Write blocks directly in a data zone, at the write pointer.
+ * If a buffer zone is assigned, invalidate the blocks written
+ * in place.
+ */
+static int
+dm_zoned_handle_direct_write(struct dm_zoned_target *dzt,
+			     struct dm_zoned_zone *zone,
+			     struct bio *bio,
+			     sector_t chunk_block,
+			     unsigned int nr_blocks)
+{
+	struct dm_zoned_zone *bzone = zone->bzone;
+	int ret;
+
+	dm_zoned_dev_debug(dzt, "=> WRITE %s zone %lu, block %zu, %u blocks\n",
+			 (dm_zoned_zone_is_cmr(zone) ? "CMR" : "SMR"),
+			 zone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	/* Submit write */
+	ret = dm_zoned_submit_zone_bio(dzt, zone, bio, chunk_block, nr_blocks);
+	if (ret)
+		return -EIO;
+
+	if (dm_zoned_zone_is_smr(zone))
+		zone->wp_block += nr_blocks;
+
+	/* Stats */
+	zone->mtime = jiffies;
+	zone->wr_dir_blocks += nr_blocks;
+
+	/* Validate the blocks in the data zone */
+	/* and invalidate in the buffer zone.   */
+	ret = dm_zoned_validate_blocks(dzt, zone, chunk_block, nr_blocks);
+	if (ret == 0 && bzone) {
+		dm_zoned_dev_assert(dzt, dm_zoned_zone_is_smr(zone));
+		ret = dm_zoned_invalidate_blocks(dzt, bzone,
+						 chunk_block, nr_blocks);
+	}
+
+	return ret;
+}
+
+/**
+ * Determine if an unaligned write in an SMR zone can be aligned.
+ * If yes, advance the zone write pointer.
+ */
+static int
+dm_zoned_align_write(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *dzone,
+		     sector_t chunk_block)
+{
+	sector_t hole_blocks;
+
+	if (!test_bit(DM_ZONED_ALIGN_WP, &dzt->flags))
+		return 0;
+
+	hole_blocks = chunk_block - dzone->wp_block;
+	if (dzone->bzone || hole_blocks > dzt->align_wp_max_blocks)
+		return 0;
+
+	return dm_zoned_advance_zone_wp(dzt, dzone, hole_blocks) == 0;
+}
+
+/**
+ * Process a write BIO.
+ */
+static int
+dm_zoned_handle_write(struct dm_zoned_target *dzt,
+		      struct dm_zoned_zone *dzone,
+		      struct bio *bio)
+{
+	unsigned int nr_blocks = dm_zoned_bio_blocks(bio);
+	sector_t chunk_block = dm_zoned_bio_chunk_block(dzt, bio);
+	int ret;
+
+	/* Write into unmapped chunks happen   */
+	/* only if we ran out of data zones... */
+	if (!dzone) {
+		dm_zoned_dev_debug(dzt, "WRITE unmapped chunk %zu, block %zu, %u blocks\n",
+				 dm_zoned_bio_chunk(dzt, bio),
+				 chunk_block,
+				 nr_blocks);
+		return -ENOSPC;
+	}
+
+	dm_zoned_dev_debug(dzt, "WRITE chunk %zu -> zone %lu, block %zu, %u blocks (wp block %zu)\n",
+			 dm_zoned_bio_chunk(dzt, bio),
+			 dzone->id,
+			 chunk_block,
+			 nr_blocks,
+			 dzone->wp_block);
+
+	if (dm_zoned_zone_readonly(dzone)) {
+		dm_zoned_dev_error(dzt, "Write to readonly zone %lu\n",
+				 dzone->id);
+		return -EROFS;
+	}
+
+	/* Write in CMR zone ? */
+	if (dm_zoned_zone_is_cmr(dzone))
+		return dm_zoned_handle_direct_write(dzt, dzone, bio,
+						    chunk_block, nr_blocks);
+
+	/* Writing to an SMR zone: direct write the part of the BIO */
+	/* that aligns with the zone write pointer and buffer write */
+	/* what cannot, which may be the entire BIO.                */
+	if (chunk_block < dzone->wp_block) {
+		unsigned int wblocks = min(nr_blocks,
+			(unsigned int)(dzone->wp_block - chunk_block));
+		ret = dm_zoned_handle_buffered_write(dzt, dzone, bio,
+						     chunk_block, wblocks);
+		if (ret)
+			goto out;
+		nr_blocks -= wblocks;
+		chunk_block += wblocks;
+	}
+
+	if (nr_blocks) {
+		if (chunk_block == dzone->wp_block)
+			ret = dm_zoned_handle_direct_write(dzt, dzone, bio,
+							   chunk_block,
+							   nr_blocks);
+		else {
+			/*
+			 * Writing after the write pointer: try to align
+			 * the write if the zone is not already buffered.
+			 * If that fails, fallback to buffered write.
+			 */
+			if (dm_zoned_align_write(dzt, dzone, chunk_block)) {
+				ret = dm_zoned_handle_direct_write(dzt, dzone,
+								   bio,
+								   chunk_block,
+								   nr_blocks);
+				if (ret == 0)
+					goto out;
+			}
+			ret = dm_zoned_handle_buffered_write(dzt, dzone, bio,
+							     chunk_block,
+							     nr_blocks);
+		}
+	}
+
+out:
+	dm_zoned_validate_bzone(dzt, dzone);
+
+	return ret;
+}
+
+static int
+dm_zoned_handle_discard(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *zone,
+			struct bio *bio)
+{
+	struct dm_zoned_zone *bzone;
+	unsigned int nr_blocks = dm_zoned_bio_blocks(bio);
+	sector_t chunk_block = dm_zoned_bio_chunk_block(dzt, bio);
+	int ret;
+
+	/* For discard into unmapped chunks, there is nothing to do */
+	if (!zone) {
+		dm_zoned_dev_debug(dzt, "DISCARD unmapped chunk %zu, block %zu, %u blocks\n",
+				 dm_zoned_bio_chunk(dzt, bio),
+				 chunk_block,
+				 nr_blocks);
+		return 0;
+	}
+
+	dm_zoned_dev_debug(dzt, "DISCARD chunk %zu -> zone %lu, block %zu, %u blocks\n",
+			 dm_zoned_bio_chunk(dzt, bio),
+			 zone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	if (dm_zoned_zone_readonly(zone)) {
+		dm_zoned_dev_error(dzt, "Discard in readonly zone %lu\n",
+				 zone->id);
+		return -EROFS;
+	}
+
+	/* Wait for all ongoing write I/Os to complete */
+	dm_zoned_wait_for_stable_zone(zone);
+
+	/* Invalidate blocks in the data zone. If a */
+	/* buffer zone is assigned, do the same.    */
+	/* The data zone write pointer may be reset */
+	bzone = zone->bzone;
+	if (bzone) {
+		ret = dm_zoned_invalidate_blocks(dzt, bzone,
+						 chunk_block, nr_blocks);
+		if (ret)
+			goto out;
+	}
+
+	/* If this is an empty SMR zone, there is nothing to do */
+	if (!dm_zoned_zone_is_smr(zone) ||
+	    !dm_zoned_zone_empty(zone))
+		ret = dm_zoned_invalidate_blocks(dzt, zone,
+						 chunk_block, nr_blocks);
+
+out:
+	dm_zoned_validate_bzone(dzt, zone);
+	dm_zoned_validate_dzone(dzt, zone);
+
+	return ret;
+}
+
+/**
+ * Process a data zone IO.
+ * Return 1 if the BIO was processed.
+ */
+static void
+dm_zoned_handle_zone_bio(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *dzone,
+			 struct bio *bio)
+{
+	int ret;
+
+	/* Process the BIO */
+	if (bio_data_dir(bio) == READ)
+		ret = dm_zoned_handle_read(dzt, dzone, bio);
+	else if (bio->bi_rw & REQ_DISCARD)
+		ret = dm_zoned_handle_discard(dzt, dzone, bio);
+	else if (bio->bi_rw & REQ_WRITE)
+		ret = dm_zoned_handle_write(dzt, dzone, bio);
+	else {
+		dm_zoned_dev_error(dzt, "Unknown BIO type 0x%lx\n",
+				 bio->bi_rw);
+		ret = -EIO;
+	}
+
+	if (ret != -EBUSY)
+		dm_zoned_bio_end(bio, ret);
+
+	return;
+}
+
+/**
+ * Zone I/O work function.
+ */
+void
+dm_zoned_zone_work(struct work_struct *work)
+{
+	struct dm_zoned_zwork *zwork =
+		container_of(work, struct dm_zoned_zwork, work);
+	struct dm_zoned_zone *dzone = zwork->dzone;
+	struct dm_zoned_target *dzt = zwork->target;
+	int n = DM_ZONE_WORK_MAX_BIO;
+	unsigned long flags;
+	struct bio *bio;
+
+	dm_zoned_lock_zone(dzone, flags);
+
+	dm_zoned_dev_assert(dzt, dzone->zwork == zwork);
+
+	while (n && bio_list_peek(&zwork->bio_list)) {
+
+		/* Process the first BIO in the list */
+		bio = bio_list_pop(&zwork->bio_list);
+		dm_zoned_unlock_zone(dzone, flags);
+
+		dm_zoned_handle_zone_bio(dzt, dzone, bio);
+
+		dm_zoned_lock_zone(dzone, flags);
+		if (test_bit(DM_ZONE_ACTIVE_WAIT, &dzone->flags)) {
+			bio_list_add_head(&zwork->bio_list, bio);
+			break;
+		}
+
+		n--;
+
+	}
+
+	dm_zoned_run_dzone(dzt, dzone);
+
+	dm_zoned_unlock_zone(dzone, flags);
+
+	dm_zoned_put_dzone(dzt, dzone);
+}
+
+/**
+ * Process a flush request. Device mapper core
+ * ensures that no other I/O is in flight. So just
+ * propagate the flush to the backend and sync metadata.
+ */
+static void
+dm_zoned_handle_flush(struct dm_zoned_target *dzt,
+		      struct bio *bio)
+{
+
+	dm_zoned_dev_debug(dzt, "FLUSH (%d active zones, %d wait active zones)\n",
+			 atomic_read(&dzt->dz_nr_active),
+			 atomic_read(&dzt->dz_nr_active_wait));
+
+	dm_zoned_bio_end(bio, dm_zoned_flush(dzt));
+}
+
+/**
+ * Flush work.
+ */
+static void
+dm_zoned_flush_work(struct work_struct *work)
+{
+	struct dm_zoned_target *dzt =
+		container_of(work, struct dm_zoned_target, flush_work);
+	struct bio *bio;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dzt->flush_lock, flags);
+	while ((bio = bio_list_pop(&dzt->flush_list))) {
+		spin_unlock_irqrestore(&dzt->flush_lock, flags);
+		dm_zoned_handle_flush(dzt, bio);
+		spin_lock_irqsave(&dzt->flush_lock, flags);
+	}
+	spin_unlock_irqrestore(&dzt->flush_lock, flags);
+}
+
+/*
+ * Process a new BIO.
+ * Return values:
+ *  DM_MAPIO_SUBMITTED : The target has submitted the bio request.
+ *  DM_MAPIO_REMAPPED  : Bio request is remapped, device mapper should submit bio.
+ *  DM_MAPIO_REQUEUE   : Request that the BIO be submitted again.
+ */
+static int
+dm_zoned_map(struct dm_target *ti,
+	     struct bio *bio)
+{
+	struct dm_zoned_target *dzt = ti->private;
+	struct dm_zoned_bioctx *bioctx
+		= dm_per_bio_data(bio, sizeof(struct dm_zoned_bioctx));
+	unsigned int nr_sectors = dm_zoned_bio_sectors(bio);
+	struct dm_zoned_zone *dzone;
+	sector_t chunk_sector;
+	unsigned long flags;
+
+	bio->bi_bdev = dzt->zbd;
+	if (!nr_sectors && !(bio->bi_rw & REQ_FLUSH)) {
+		bio->bi_bdev = dzt->zbd;
+		return DM_MAPIO_REMAPPED;
+	}
+
+	/* The BIO should be block aligned */
+	if ((nr_sectors & DM_ZONED_BLOCK_SECTORS_MASK) ||
+	    (dm_zoned_bio_sector(bio) & DM_ZONED_BLOCK_SECTORS_MASK)) {
+		dm_zoned_dev_error(dzt, "Unaligned BIO sector %zu, len %u\n",
+				 dm_zoned_bio_sector(bio),
+				 nr_sectors);
+		return -EIO;
+	}
+
+	dzt->last_bio_time = jiffies;
+
+	/* Initialize the IO context */
+	bioctx->target = dzt;
+	bioctx->dzone = NULL;
+	bioctx->bio = bio;
+	atomic_set(&bioctx->ref, 1);
+	bioctx->error = 0;
+
+	/* Set the BIO pending in the flush list */
+	if (bio->bi_rw & REQ_FLUSH) {
+		spin_lock_irqsave(&dzt->flush_lock, flags);
+		bio_list_add(&dzt->flush_list, bio);
+		spin_unlock_irqrestore(&dzt->flush_lock, flags);
+		queue_work(dzt->flush_wq, &dzt->flush_work);
+		return DM_MAPIO_SUBMITTED;
+	}
+
+	/* Split zone BIOs to fit entirely into a zone */
+	chunk_sector = dm_zoned_bio_chunk_sector(dzt, bio);
+	if (chunk_sector + nr_sectors > dzt->zone_nr_sectors)
+		dm_accept_partial_bio(bio, dzt->zone_nr_sectors - chunk_sector);
+
+	dm_zoned_dev_debug(dzt, "BIO sector %zu, len %u -> chunk %zu\n",
+			 dm_zoned_bio_sector(bio),
+			 dm_zoned_bio_sectors(bio),
+			 dm_zoned_bio_chunk(dzt, bio));
+
+	/* Get the zone mapping the chunk the BIO belongs to. */
+	/* If the chunk is unmapped, process the BIO directly */
+	/* without going through the zone work.               */
+	dzone = dm_zoned_bio_map(dzt, bio);
+	if (IS_ERR(dzone))
+		return PTR_ERR(dzone);
+	if (!dzone)
+		dm_zoned_handle_zone_bio(dzt, NULL, bio);
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+/**
+ * Parse dmsetup arguments.
+ */
+static int
+dm_zoned_parse_args(struct dm_target *ti,
+		    struct dm_arg_set *as,
+		    struct dm_zoned_target_config *conf)
+{
+	const char *arg;
+	int ret = 0;
+
+	/* Check arguments */
+	if (as->argc < 1) {
+		ti->error = "No target device specified";
+		return -EINVAL;
+	}
+
+	/* Set defaults */
+	conf->dev_path = (char *) dm_shift_arg(as);
+	conf->format = 0;
+	conf->nr_buf_zones = DM_ZONED_NR_BZONES;
+	conf->align_wp = DM_ZONED_ALIGN_WP_MAX_BLOCK;
+	conf->debug = 0;
+
+	while (as->argc) {
+
+		arg = dm_shift_arg(as);
+
+		if (strcmp(arg, "debug") == 0) {
+#ifdef __DM_ZONED_DEBUG
+			dm_zoned_info("Debug messages enabled\n");
+			conf->debug = 1;
+#else
+			dm_zoned_info("Debug message support not enabled: ignoring option \"debug\"\n");
+#endif
+			continue;
+		}
+
+		if (strcmp(arg, "format") == 0) {
+			conf->format = 1;
+			continue;
+		}
+
+		if (strncmp(arg, "num_bzones=", 11) == 0) {
+			if (kstrtoul(arg + 11, 0, &conf->nr_buf_zones) < 0) {
+				ti->error = "Invalid number of buffer zones";
+				break;
+			}
+			continue;
+		}
+
+		if (strncmp(arg, "align_wp=", 9) == 0) {
+			if (kstrtoul(arg + 9, 0, &conf->align_wp) < 0) {
+				ti->error = "Invalid number of blocks";
+				break;
+			}
+			continue;
+		}
+
+		ti->error = "Unknown argument";
+		return -EINVAL;
+
+	}
+
+	return ret;
+
+}
+
+/**
+ * Setup target.
+ */
+static int
+dm_zoned_ctr(struct dm_target *ti,
+	     unsigned int argc,
+	     char **argv)
+{
+	struct dm_zoned_target_config conf;
+	struct dm_zoned_target *dzt;
+	struct dm_arg_set as;
+	char wq_name[32];
+	int ret;
+
+	/* Parse arguments */
+	as.argc = argc;
+	as.argv = argv;
+	ret = dm_zoned_parse_args(ti, &as, &conf);
+	if (ret)
+		return ret;
+
+	dm_zoned_info("Intializing device %s\n", conf.dev_path);
+
+	/* Allocate and initialize the target descriptor */
+	dzt = kzalloc(sizeof(struct dm_zoned_target), GFP_KERNEL);
+	if (!dzt) {
+		ti->error = "Allocate target descriptor failed";
+		return -ENOMEM;
+	}
+	dm_zoned_account_mem(dzt, sizeof(struct dm_zoned_target));
+
+	/* Get the target device */
+	ret = dm_get_device(ti, conf.dev_path, dm_table_get_mode(ti->table),
+			    &dzt->ddev);
+	if (ret != 0) {
+		ti->error = "Get target device failed";
+		goto err;
+	}
+
+	dzt->zbd = dzt->ddev->bdev;
+	dzt->zbd_capacity = i_size_read(dzt->zbd->bd_inode) >> SECTOR_SHIFT;
+	if (ti->begin ||
+	    (ti->len != dzt->zbd_capacity)) {
+		ti->error = "Partial mapping not supported";
+		ret = -EINVAL;
+		goto err;
+	}
+
+	(void)bdevname(dzt->zbd, dzt->zbd_name);
+	dzt->zbdq = bdev_get_queue(dzt->zbd);
+	dzt->zbd_metablk_shift = DM_ZONED_BLOCK_SHIFT -
+		dzt->zbd->bd_inode->i_sb->s_blocksize_bits;
+	if (conf.debug)
+		set_bit(DM_ZONED_DEBUG, &dzt->flags);
+
+	mutex_init(&dzt->map_lock);
+	INIT_LIST_HEAD(&dzt->bz_lru_list);
+	INIT_LIST_HEAD(&dzt->bz_free_list);
+	INIT_LIST_HEAD(&dzt->bz_wait_list);
+	INIT_LIST_HEAD(&dzt->dz_unmap_smr_list);
+	INIT_LIST_HEAD(&dzt->dz_unmap_cmr_list);
+	INIT_LIST_HEAD(&dzt->dz_map_cmr_list);
+	INIT_LIST_HEAD(&dzt->dz_empty_list);
+	atomic_set(&dzt->dz_nr_active, 0);
+	atomic_set(&dzt->dz_nr_active_wait, 0);
+
+	dm_zoned_dev_info(dzt, "Initializing device %s\n",
+			dzt->zbd_name);
+
+	ret = dm_zoned_init_meta(dzt, &conf);
+	if (ret != 0) {
+		ti->error = "Metadata initialization failed";
+		goto err;
+	}
+
+	/* Set target (no write same support) */
+	ti->private = dzt;
+	ti->max_io_len = dzt->zone_nr_sectors << 9;
+	ti->num_flush_bios = 1;
+	ti->num_discard_bios = 1;
+	ti->num_write_same_bios = 0;
+	ti->per_io_data_size = sizeof(struct dm_zoned_bioctx);
+	ti->flush_supported = true;
+	ti->discards_supported = true;
+	ti->split_discard_bios = true;
+	ti->discard_zeroes_data_unsupported = true;
+	ti->len = dzt->zone_nr_sectors * dzt->nr_data_zones;
+
+	if (conf.align_wp) {
+		set_bit(DM_ZONED_ALIGN_WP, &dzt->flags);
+		dzt->align_wp_max_blocks = min_t(unsigned int, conf.align_wp,
+						 dzt->zone_nr_blocks > 1);
+	}
+
+	/* BIO set */
+	dzt->bio_set = bioset_create(DM_ZONED_MIN_BIOS, 0);
+	if (!dzt->bio_set) {
+		ti->error = "Create BIO set failed";
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	/* Zone I/O work queue */
+	snprintf(wq_name, sizeof(wq_name), "dm_zoned_zwq_%s", dzt->zbd_name);
+	dzt->zone_wq = create_workqueue(wq_name);
+	if (!dzt->zone_wq) {
+		ti->error = "Create zone workqueue failed";
+		ret = -ENOMEM;
+		goto err;
+	}
+	dm_zoned_dev_info(dzt, "Allowing at most %d zone workers\n",
+			  min_t(int, dzt->nr_buf_zones * 2, DM_ZONE_WORK_MAX));
+	workqueue_set_max_active(dzt->zone_wq,
+				 min_t(int, dzt->nr_buf_zones * 2,
+				       DM_ZONE_WORK_MAX));
+
+	/* Flush work */
+	spin_lock_init(&dzt->flush_lock);
+	bio_list_init(&dzt->flush_list);
+	INIT_WORK(&dzt->flush_work, dm_zoned_flush_work);
+	snprintf(wq_name, sizeof(wq_name), "dm_zoned_fwq_%s", dzt->zbd_name);
+	dzt->flush_wq = create_singlethread_workqueue(wq_name);
+	if (!dzt->flush_wq) {
+		ti->error = "Create flush workqueue failed";
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	/* Buffer zones reclaim work */
+	dzt->reclaim_client = dm_io_client_create();
+	if (IS_ERR(dzt->reclaim_client)) {
+		ti->error = "Create GC I/O client failed";
+		ret = PTR_ERR(dzt->reclaim_client);
+		dzt->reclaim_client = NULL;
+		goto err;
+	}
+	INIT_DELAYED_WORK(&dzt->reclaim_work, dm_zoned_reclaim_work);
+	snprintf(wq_name, sizeof(wq_name), "dm_zoned_rwq_%s", dzt->zbd_name);
+	dzt->reclaim_wq = create_singlethread_workqueue(wq_name);
+	if (!dzt->reclaim_wq) {
+		ti->error = "Create reclaim workqueue failed";
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	snprintf(wq_name, sizeof(wq_name), "dm_zoned_rzwq_%s", dzt->zbd_name);
+	dzt->reclaim_zwq = create_workqueue(wq_name);
+	if (!dzt->reclaim_zwq) {
+		ti->error = "Create reclaim zone workqueue failed";
+		ret = -ENOMEM;
+		goto err;
+	}
+	workqueue_set_max_active(dzt->reclaim_zwq,
+				 DM_ZONED_RECLAIM_MAX_WORKERS);
+
+	dm_zoned_dev_info(dzt,
+			  "Target device: %zu 512-byte logical sectors (%zu blocks)\n",
+			  ti->len,
+			  dm_zoned_sector_to_block(ti->len));
+
+	dzt->last_bio_time = jiffies;
+	dm_zoned_trigger_reclaim(dzt);
+
+	return 0;
+
+err:
+
+	if (dzt->ddev) {
+		if (dzt->reclaim_wq)
+			destroy_workqueue(dzt->reclaim_wq);
+		if (dzt->reclaim_client)
+			dm_io_client_destroy(dzt->reclaim_client);
+		if (dzt->flush_wq)
+			destroy_workqueue(dzt->flush_wq);
+		if (dzt->zone_wq)
+			destroy_workqueue(dzt->zone_wq);
+		if (dzt->bio_set)
+			bioset_free(dzt->bio_set);
+		dm_zoned_cleanup_meta(dzt);
+		dm_put_device(ti, dzt->ddev);
+	}
+
+	kfree(dzt);
+
+	return ret;
+
+}
+
+/**
+ * Cleanup target.
+ */
+static void
+dm_zoned_dtr(struct dm_target *ti)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	dm_zoned_dev_info(dzt, "Removing target device\n");
+
+	dm_zoned_flush(dzt);
+
+	flush_workqueue(dzt->zone_wq);
+	destroy_workqueue(dzt->zone_wq);
+
+	flush_workqueue(dzt->reclaim_zwq);
+	cancel_delayed_work_sync(&dzt->reclaim_work);
+	destroy_workqueue(dzt->reclaim_zwq);
+	destroy_workqueue(dzt->reclaim_wq);
+	dm_io_client_destroy(dzt->reclaim_client);
+
+	flush_workqueue(dzt->flush_wq);
+	destroy_workqueue(dzt->flush_wq);
+
+	bioset_free(dzt->bio_set);
+
+	dm_zoned_cleanup_meta(dzt);
+
+	dm_put_device(ti, dzt->ddev);
+
+	kfree(dzt);
+}
+
+/**
+ * Setup target request queue limits.
+ */
+static void
+dm_zoned_io_hints(struct dm_target *ti,
+		  struct queue_limits *limits)
+{
+	struct dm_zoned_target *dzt = ti->private;
+	unsigned int chunk_sectors = dzt->zone_nr_sectors;
+
+	BUG_ON(!is_power_of_2(chunk_sectors));
+
+	/* Align to zone size */
+	limits->chunk_sectors = chunk_sectors;
+	limits->max_sectors = chunk_sectors;
+
+	blk_limits_io_min(limits, DM_ZONED_BLOCK_SIZE);
+	blk_limits_io_opt(limits, DM_ZONED_BLOCK_SIZE);
+
+	limits->logical_block_size = DM_ZONED_BLOCK_SIZE;
+	limits->physical_block_size = DM_ZONED_BLOCK_SIZE;
+
+	limits->discard_alignment = DM_ZONED_BLOCK_SIZE;
+	limits->discard_granularity = DM_ZONED_BLOCK_SIZE;
+	limits->max_discard_sectors = chunk_sectors;
+	limits->max_hw_discard_sectors = chunk_sectors;
+	limits->discard_zeroes_data = true;
+
+}
+
+/**
+ * Pass on ioctl to the backend device.
+ */
+static int
+dm_zoned_prepare_ioctl(struct dm_target *ti,
+		       struct block_device **bdev,
+		       fmode_t *mode)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	*bdev = dzt->zbd;
+
+	return 0;
+}
+
+/**
+ * Stop reclaim before suspend.
+ */
+static void
+dm_zoned_presuspend(struct dm_target *ti)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	dm_zoned_dev_debug(dzt, "Pre-suspend\n");
+
+	/* Enter suspend state */
+	set_bit(DM_ZONED_SUSPENDED, &dzt->flags);
+	smp_mb__after_atomic();
+
+	/* Stop reclaim */
+	cancel_delayed_work_sync(&dzt->reclaim_work);
+}
+
+/**
+ * Restart reclaim if suspend failed.
+ */
+static void
+dm_zoned_presuspend_undo(struct dm_target *ti)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	dm_zoned_dev_debug(dzt, "Pre-suspend undo\n");
+
+	/* Clear suspend state */
+	clear_bit_unlock(DM_ZONED_SUSPENDED, &dzt->flags);
+	smp_mb__after_atomic();
+
+	/* Restart reclaim */
+	mod_delayed_work(dzt->reclaim_wq, &dzt->reclaim_work, 0);
+}
+
+/**
+ * Stop works and flush on suspend.
+ */
+static void
+dm_zoned_postsuspend(struct dm_target *ti)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	dm_zoned_dev_debug(dzt, "Post-suspend\n");
+
+	/* Stop works and flush */
+	flush_workqueue(dzt->zone_wq);
+	flush_workqueue(dzt->flush_wq);
+
+	dm_zoned_flush(dzt);
+}
+
+/**
+ * Refresh zone information before resuming.
+ */
+static int
+dm_zoned_preresume(struct dm_target *ti)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	if (!test_bit(DM_ZONED_SUSPENDED, &dzt->flags))
+		return 0;
+
+	dm_zoned_dev_debug(dzt, "Pre-resume\n");
+
+	/* Refresh zone information */
+	return dm_zoned_resume_meta(dzt);
+}
+
+/**
+ * Resume.
+ */
+static void
+dm_zoned_resume(struct dm_target *ti)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	if (!test_bit(DM_ZONED_SUSPENDED, &dzt->flags))
+		return;
+
+	dm_zoned_dev_debug(dzt, "Resume\n");
+
+	/* Clear suspend state */
+	clear_bit_unlock(DM_ZONED_SUSPENDED, &dzt->flags);
+	smp_mb__after_atomic();
+
+	/* Restart reclaim */
+	mod_delayed_work(dzt->reclaim_wq, &dzt->reclaim_work, 0);
+
+}
+
+static int
+dm_zoned_iterate_devices(struct dm_target *ti,
+			 iterate_devices_callout_fn fn,
+			 void *data)
+{
+	struct dm_zoned_target *dzt = ti->private;
+
+	return fn(ti, dzt->ddev, dzt->nr_meta_zones * dzt->zone_nr_sectors,
+		  ti->len, data);
+}
+
+/**
+ * Module definition.
+ */
+static struct target_type dm_zoned_type = {
+	.name		 = "dm-zoned",
+	.version	 = {1, 0, 0},
+	.module		 = THIS_MODULE,
+	.ctr		 = dm_zoned_ctr,
+	.dtr		 = dm_zoned_dtr,
+	.map		 = dm_zoned_map,
+	.io_hints	 = dm_zoned_io_hints,
+	.prepare_ioctl	 = dm_zoned_prepare_ioctl,
+	.presuspend	 = dm_zoned_presuspend,
+	.presuspend_undo = dm_zoned_presuspend_undo,
+	.postsuspend	 = dm_zoned_postsuspend,
+	.preresume	 = dm_zoned_preresume,
+	.resume		 = dm_zoned_resume,
+	.iterate_devices = dm_zoned_iterate_devices,
+};
+
+struct kmem_cache *dm_zoned_zone_cache;
+
+static int __init dm_zoned_init(void)
+{
+	int ret;
+
+	dm_zoned_info("Version %d.%d, (C) Western Digital\n",
+		    DM_ZONED_VER_MAJ,
+		    DM_ZONED_VER_MIN);
+
+	dm_zoned_zone_cache = KMEM_CACHE(dm_zoned_zone, 0);
+	if (!dm_zoned_zone_cache)
+		return -ENOMEM;
+
+	ret = dm_register_target(&dm_zoned_type);
+	if (ret != 0) {
+		dm_zoned_error("Register dm-zoned target failed %d\n", ret);
+		kmem_cache_destroy(dm_zoned_zone_cache);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit dm_zoned_exit(void)
+{
+	dm_unregister_target(&dm_zoned_type);
+	kmem_cache_destroy(dm_zoned_zone_cache);
+}
+
+module_init(dm_zoned_init);
+module_exit(dm_zoned_exit);
+
+MODULE_DESCRIPTION(DM_NAME " target for ZBC/ZAC devices (host-managed and host-aware)");
+MODULE_AUTHOR("Damien Le Moal <damien.lemoal@xxxxxxxx>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-zoned-meta.c b/drivers/md/dm-zoned-meta.c
new file mode 100644
index 0000000..b9e5161
--- /dev/null
+++ b/drivers/md/dm-zoned-meta.c
@@ -0,0 +1,1950 @@
+/*
+ * (C) Copyright 2016 Western Digital.
+ *
+ * This software is distributed under the terms of the GNU Lesser General
+ * Public License version 2, or any later version, "as is," without technical
+ * support, and WITHOUT ANY WARRANTY, without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * Author: Damien Le Moal <damien.lemoal@xxxxxxxx>
+ */
+
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/slab.h>
+
+#include "dm-zoned.h"
+
+/**
+ * Free zones descriptors.
+ */
+static void
+dm_zoned_drop_zones(struct dm_zoned_target *dzt)
+{
+	struct blk_zone *blkz;
+	sector_t sector = 0;
+
+	/* Allocate and initialize zone descriptors */
+	while (sector < dzt->zbd_capacity) {
+		blkz = blk_lookup_zone(dzt->zbdq, sector);
+		if (blkz && blkz->private_data) {
+			kmem_cache_free(dm_zoned_zone_cache,
+					blkz->private_data);
+			blkz->private_data = NULL;
+		}
+		sector = blkz->start + blkz->len;
+	}
+}
+
+/**
+ * Allocate and initialize zone descriptors
+ * using the zone information from disk.
+ */
+static int
+dm_zoned_init_zones(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *zone, *last_meta_zone = NULL;
+	struct blk_zone *blkz;
+	sector_t sector = 0;
+	int ret = -ENXIO;
+
+	/* Allocate and initialize zone descriptors */
+	while (sector < dzt->zbd_capacity) {
+
+		blkz = blk_lookup_zone(dzt->zbdq, sector);
+		if (!blkz) {
+			dm_zoned_dev_error(dzt,
+				"Unable to get zone at sector %zu\n",
+				sector);
+			goto out;
+		}
+
+		zone = kmem_cache_alloc(dm_zoned_zone_cache, GFP_KERNEL);
+		if (!zone) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		dm_zoned_account_mem(dzt, sizeof(struct dm_zoned_zone));
+
+		/* Assume at this stage that all zones are unmapped */
+		/* data zones. This will be corrected later using   */
+		/* the buffer and data zone mapping tables.         */
+		blkz->private_data = zone;
+		INIT_LIST_HEAD(&zone->link);
+		INIT_LIST_HEAD(&zone->elink);
+		zone->id = dzt->nr_zones;
+		zone->blkz = blkz;
+		zone->flags = DM_ZONE_DATA;
+		zone->zwork = NULL;
+		zone->map = DM_ZONED_MAP_UNMAPPED;
+		zone->bzone = NULL;
+
+		if (!dzt->nr_zones)
+			dzt->zone_nr_sectors = blkz->len;
+
+		if (dm_zoned_zone_is_smr(zone)) {
+			zone->wp_block = dm_zoned_sector_to_block(blkz->wp)
+				- dm_zoned_zone_start_block(zone);
+			list_add_tail(&zone->link, &dzt->dz_unmap_smr_list);
+			dzt->nr_smr_zones++;
+		} else {
+			zone->wp_block = 0;
+			list_add_tail(&zone->link, &dzt->dz_unmap_cmr_list);
+			dzt->nr_cmr_zones++;
+		}
+
+		dm_zoned_zone_reset_stats(zone);
+
+		if (dm_zoned_zone_is_rnd(zone)) {
+			dzt->nr_rnd_zones++;
+			if ((!last_meta_zone) ||
+			    dm_zoned_zone_next_sector(last_meta_zone) ==
+			    sector) {
+				dzt->nr_meta_zones++;
+				last_meta_zone = zone;
+			}
+		}
+
+		dzt->nr_zones++;
+		sector = dm_zoned_zone_next_sector(zone);
+
+	}
+
+	if (!dzt->nr_zones) {
+		dm_zoned_dev_error(dzt, "No zones information\n");
+		goto out;
+	}
+
+	if (!dzt->nr_rnd_zones) {
+		dm_zoned_dev_error(dzt, "No randomly writable zones found\n");
+		goto out;
+	}
+
+	if (!dzt->nr_meta_zones) {
+		dm_zoned_dev_error(dzt, "No metadata zones found\n");
+		goto out;
+	}
+
+	/* Temporaray ? We can make it work for any zone size... */
+	if (!is_power_of_2(dzt->zone_nr_sectors)) {
+		dm_zoned_dev_error(dzt,
+			"Sectors per zone %zu is not a power of 2\n",
+			dzt->zone_nr_sectors);
+		goto out;
+	}
+
+	dzt->zone_nr_sectors_shift = ilog2(dzt->zone_nr_sectors);
+	dzt->zone_nr_sectors_mask = dzt->zone_nr_sectors - 1;
+
+	dzt->zone_nr_blocks = dm_zoned_sector_to_block(dzt->zone_nr_sectors);
+	dzt->zone_nr_blocks_shift = ilog2(dzt->zone_nr_blocks);
+	dzt->zone_nr_blocks_mask = dzt->zone_nr_blocks - 1;
+
+	dzt->zone_bitmap_size = dzt->zone_nr_blocks >> 3;
+	dzt->zone_nr_bitmap_blocks = dzt->zone_bitmap_size >>
+		DM_ZONED_BLOCK_SHIFT;
+
+	ret = 0;
+
+out:
+
+	if (ret != 0)
+		dm_zoned_drop_zones(dzt);
+
+	return ret;
+}
+
+/**
+ * Check zone information after a resume.
+ */
+static int
+dm_zoned_check_zones(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *zone;
+	struct blk_zone *blkz;
+	sector_t sector = 0;
+	sector_t wp_block;
+
+	/* Allocate and initialize zone descriptors */
+	while (sector < dzt->zbd_capacity) {
+
+		blkz = blk_lookup_zone(dzt->zbdq, sector);
+		if (!blkz) {
+			dm_zoned_dev_error(dzt,
+				"Unable to get zone at sector %zu\n", sector);
+			return -EIO;
+		}
+
+		zone = blkz->private_data;
+		if (!zone) {
+			dm_zoned_dev_error(dzt,
+				"Lost private data of zone at sector %zu\n",
+				sector);
+			return -EIO;
+		}
+
+		if (zone->blkz != blkz) {
+			dm_zoned_dev_error(dzt,
+				"Inconsistent private data of zone at sector %zu\n",
+				sector);
+			return -EIO;
+		}
+
+		wp_block = dm_zoned_sector_to_block(blkz->wp) -
+			dm_zoned_zone_start_block(zone);
+		if (!dm_zoned_zone_is_smr(zone))
+			zone->wp_block = 0;
+		else if (zone->wp_block != wp_block) {
+			dm_zoned_dev_error(dzt,
+				"Zone %lu: Inconsistent write pointer position (%zu / %zu)\n",
+				zone->id, zone->wp_block, wp_block);
+			zone->wp_block = wp_block;
+			dm_zoned_invalidate_blocks(dzt, zone, zone->wp_block,
+				dzt->zone_nr_blocks - zone->wp_block);
+			dm_zoned_validate_dzone(dzt, zone);
+		}
+
+		sector = dm_zoned_zone_next_sector(zone);
+
+	}
+
+	return 0;
+}
+
+/**
+ * Lookup a zone containing the specified sector.
+ */
+static inline struct dm_zoned_zone *
+dm_zoned_lookup_zone(struct dm_zoned_target *dzt,
+		     sector_t sector)
+{
+	struct blk_zone *blkz = blk_lookup_zone(dzt->zbdq, sector);
+
+	return blkz ? blkz->private_data : NULL;
+}
+
+/**
+ * Lookup a zone using a zone ID.
+ */
+static inline struct dm_zoned_zone *
+dm_zoned_lookup_zone_by_id(struct dm_zoned_target *dzt,
+			   unsigned int zone_id)
+{
+	return dm_zoned_lookup_zone(dzt, (sector_t)zone_id <<
+				    dzt->zone_nr_sectors_shift);
+}
+
+/**
+ * Set a zone write pointer.
+ */
+int
+dm_zoned_advance_zone_wp(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 sector_t nr_blocks)
+{
+	int ret;
+
+	if (!dm_zoned_zone_is_smr(zone) ||
+	    zone->wp_block + nr_blocks > dm_zoned_zone_next_block(zone))
+		return -EIO;
+
+	/* Zeroout the space between the write */
+	/* pointer and the requested position. */
+	ret = blkdev_issue_zeroout(dzt->zbd,
+		dm_zoned_block_to_sector(dm_zoned_zone_start_block(zone) +
+					 zone->wp_block),
+		dm_zoned_block_to_sector(nr_blocks), GFP_KERNEL, false);
+	if (ret) {
+		dm_zoned_dev_error(dzt,
+			"Advance zone %lu wp block %zu by %zu blocks failed %d\n",
+			zone->id, zone->wp_block, nr_blocks, ret);
+		return ret;
+	}
+
+	zone->wp_block += nr_blocks;
+
+	return 0;
+}
+
+/**
+ * Reset a zone write pointer.
+ */
+int
+dm_zoned_reset_zone_wp(struct dm_zoned_target *dzt,
+		       struct dm_zoned_zone *zone)
+{
+	int ret;
+
+	/* Ignore offline zones, read only zones, */
+	/* CMR zones and empty SMR zones.         */
+	if (dm_zoned_zone_offline(zone)
+	    || dm_zoned_zone_readonly(zone)
+	    || dm_zoned_zone_is_cmr(zone)
+	    || dm_zoned_zone_empty(zone))
+		return 0;
+
+	/* Discard the zone */
+	ret = blkdev_issue_discard(dzt->zbd,
+				   dm_zoned_zone_start_sector(zone),
+				   dm_zoned_zone_sectors(zone),
+				   GFP_KERNEL, 0);
+	if (ret) {
+		dm_zoned_dev_error(dzt, "Reset zone %lu failed %d\n",
+				 zone->id, ret);
+		return ret;
+	}
+
+	/* Rewind */
+	zone->wp_block = 0;
+
+	return 0;
+}
+
+/**
+ * Reset all zones write pointer.
+ */
+static int
+dm_zoned_reset_zones(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *zone;
+	sector_t sector = 0;
+	int ret = 0;
+
+	dm_zoned_dev_debug(dzt, "Resetting all zones\n");
+
+	while ((zone = dm_zoned_lookup_zone(dzt, sector))) {
+		ret = dm_zoned_reset_zone_wp(dzt, zone);
+		if (ret)
+			return ret;
+		sector = dm_zoned_zone_next_sector(zone);
+	}
+
+	return 0;
+}
+
+/**
+ * Get from cache or read from disk a metadata block.
+ */
+static struct buffer_head *
+dm_zoned_get_meta(struct dm_zoned_target *dzt,
+		  sector_t block)
+{
+	struct buffer_head *bh;
+
+	/* Get block */
+	bh = __bread(dzt->zbd,
+		     block << dzt->zbd_metablk_shift,
+		     DM_ZONED_BLOCK_SIZE);
+	if (!bh) {
+		dm_zoned_dev_error(dzt, "Read block %zu failed\n",
+				 block);
+		return ERR_PTR(-EIO);
+	}
+
+	return bh;
+}
+
+/**
+ * Mark a metadata block dirty.
+ */
+static inline void
+dm_zoned_dirty_meta(struct dm_zoned_target *dzt,
+		    struct buffer_head *bh)
+{
+	mark_buffer_dirty_inode(bh, dzt->zbd->bd_inode);
+}
+
+/**
+ * Zero fill a metadata block.
+ */
+static int
+dm_zoned_zero_meta(struct dm_zoned_target *dzt,
+		   sector_t block)
+{
+	struct buffer_head *bh = dm_zoned_get_meta(dzt, block);
+
+	if (IS_ERR(bh))
+		return PTR_ERR(bh);
+
+	memset(bh->b_data, 0, DM_ZONED_BLOCK_SIZE);
+	dm_zoned_dirty_meta(dzt, bh);
+	__brelse(bh);
+
+	return 0;
+}
+
+/**
+ * Flush dirty meta-data.
+ */
+int
+dm_zoned_flush(struct dm_zoned_target *dzt)
+{
+	int ret;
+
+	/* Sync meta-data */
+	ret = sync_mapping_buffers(dzt->zbd->bd_inode->i_mapping);
+	if (ret) {
+		dm_zoned_dev_error(dzt, "Sync metadata failed %d\n", ret);
+		return ret;
+	}
+
+	/* Flush drive cache (this will also sync data) */
+	return blkdev_issue_flush(dzt->zbd, GFP_KERNEL, NULL);
+}
+
+/**
+ * Format buffer zone mapping.
+ */
+static int
+dm_zoned_format_bzone_mapping(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_super *sb =
+		(struct dm_zoned_super *) dzt->sb_bh->b_data;
+	struct dm_zoned_zone *zone;
+	int z, b = 0;
+
+	/* Set buffer zones mapping entries */
+	dzt->bz_map = sb->bz_map;
+	for (z = dzt->nr_meta_zones;
+	     (z < dzt->nr_zones) && (b < dzt->nr_buf_zones); z++) {
+		zone = dm_zoned_lookup_zone_by_id(dzt, z);
+		if (!zone)
+			return -ENXIO;
+		if (dm_zoned_zone_is_rnd(zone)) {
+			dzt->bz_map[b].bzone_id = cpu_to_le32(zone->id);
+			dzt->bz_map[b].dzone_id =
+				cpu_to_le32(DM_ZONED_MAP_UNMAPPED);
+			b++;
+		}
+	}
+
+	if (b < dzt->nr_buf_zones) {
+		dm_zoned_dev_error(dzt,
+			"Broken format: %d/%u buffer zones set\n",
+			b, dzt->nr_buf_zones);
+		return -ENXIO;
+	}
+
+	return 0;
+}
+
+/**
+ * Initialize buffer zone mapping.
+ */
+static int
+dm_zoned_load_bzone_mapping(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_super *sb =
+		(struct dm_zoned_super *) dzt->sb_bh->b_data;
+	struct dm_zoned_zone *bzone, *dzone;
+	unsigned long bzone_id, dzone_id;
+	int i, b = 0;
+
+	/* Process buffer zones mapping entries */
+	dzt->bz_map = sb->bz_map;
+	for (i = 0; i < dzt->nr_buf_zones; i++) {
+
+		bzone_id = le32_to_cpu(dzt->bz_map[i].bzone_id);
+		if (!bzone_id || bzone_id >= dzt->nr_zones) {
+			dm_zoned_dev_error(dzt,
+				"Invalid buffer zone %lu in mapping table entry %d\n",
+				bzone_id, i);
+			return -ENXIO;
+		}
+
+		bzone = dm_zoned_lookup_zone_by_id(dzt, bzone_id);
+		if (!bzone) {
+			dm_zoned_dev_error(dzt, "Buffer zone %lu not found\n",
+					   bzone_id);
+			return -ENXIO;
+		}
+
+		/* Fix the zone type */
+		bzone->flags = DM_ZONE_BUF;
+		list_del_init(&bzone->link);
+		bzone->map = i;
+
+		dzone_id = le32_to_cpu(dzt->bz_map[i].dzone_id);
+		if (dzone_id != DM_ZONED_MAP_UNMAPPED) {
+			if (dzone_id >= dzt->nr_zones) {
+				dm_zoned_dev_error(dzt,
+					"Invalid data zone %lu in mapping table entry %d\n",
+					dzone_id, i);
+				return -ENXIO;
+			}
+			dzone = dm_zoned_lookup_zone_by_id(dzt, dzone_id);
+			if (!dzone) {
+				dm_zoned_dev_error(dzt,
+					"Data zone %lu not found\n", dzone_id);
+				return -ENXIO;
+			}
+		} else
+			dzone = NULL;
+
+		if (dzone) {
+			dm_zoned_dev_debug(dzt,
+				"Zone %lu is buffering zone %lu\n",
+				bzone->id, dzone->id);
+			dzone->bzone = bzone;
+			bzone->bzone = dzone;
+			list_add_tail(&bzone->link, &dzt->bz_lru_list);
+		} else {
+			list_add_tail(&bzone->link, &dzt->bz_free_list);
+			atomic_inc(&dzt->bz_nr_free);
+		}
+
+		b++;
+
+	}
+
+	if (b != dzt->nr_buf_zones) {
+		dm_zoned_dev_error(dzt,
+			"Invalid buffer zone mapping (%d / %u valid entries)\n",
+			b, dzt->nr_buf_zones);
+		return -ENXIO;
+	}
+
+	dzt->bz_nr_free_low = dzt->nr_buf_zones * DM_ZONED_NR_BZONES_LOW / 100;
+	if (dzt->bz_nr_free_low < DM_ZONED_NR_BZONES_LOW_MIN)
+		dzt->bz_nr_free_low = DM_ZONED_NR_BZONES_LOW_MIN;
+
+	return 0;
+}
+
+/**
+ * Set a buffer zone mapping.
+ */
+static void
+dm_zoned_set_bzone_mapping(struct dm_zoned_target *dzt,
+			   struct dm_zoned_zone *bzone,
+			   unsigned int dzone_id)
+{
+	struct dm_zoned_bz_map *bz_map = &dzt->bz_map[bzone->map];
+
+	dm_zoned_dev_assert(dzt, le32_to_cpu(bz_map->bzone_id) == bzone->id);
+
+	lock_buffer(dzt->sb_bh);
+	bz_map->dzone_id = cpu_to_le32(dzone_id);
+	dm_zoned_dirty_meta(dzt, dzt->sb_bh);
+	unlock_buffer(dzt->sb_bh);
+}
+
+/**
+ * Change a buffer zone mapping.
+ */
+static void
+dm_zoned_change_bzone_mapping(struct dm_zoned_target *dzt,
+			      struct dm_zoned_zone *bzone,
+			      struct dm_zoned_zone *new_bzone,
+			      unsigned int dzone_id)
+{
+	struct dm_zoned_bz_map *bz_map = &dzt->bz_map[bzone->map];
+
+	new_bzone->map = bzone->map;
+	bzone->map = DM_ZONED_MAP_UNMAPPED;
+
+	lock_buffer(dzt->sb_bh);
+	bz_map->bzone_id = cpu_to_le32(new_bzone->id);
+	bz_map->dzone_id = cpu_to_le32(dzone_id);
+	dm_zoned_dirty_meta(dzt, dzt->sb_bh);
+	unlock_buffer(dzt->sb_bh);
+}
+
+/**
+ * Get an unused buffer zone and associate it
+ * with @zone.
+ */
+struct dm_zoned_zone *
+dm_zoned_alloc_bzone(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *dzone)
+{
+	struct dm_zoned_zone *bzone;
+
+	dm_zoned_map_lock(dzt);
+
+	/* If the data zone already has a buffer */
+	/* zone assigned, keep using it.         */
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(dzone));
+	bzone = dzone->bzone;
+	if (bzone)
+		goto out;
+
+	/* If there is no free buffer zone, put the zone to wait */
+	if (!atomic_read(&dzt->bz_nr_free)) {
+		unsigned long flags;
+		dm_zoned_lock_zone(dzone, flags);
+		dm_zoned_dev_assert(dzt, test_bit(DM_ZONE_ACTIVE,
+						  &dzone->flags));
+		dm_zoned_dev_assert(dzt, dzone->zwork);
+		if (!test_and_set_bit(DM_ZONE_ACTIVE_WAIT, &dzone->flags)) {
+			list_add_tail(&dzone->zwork->link, &dzt->bz_wait_list);
+			atomic_inc(&dzt->dz_nr_active_wait);
+		}
+		dm_zoned_unlock_zone(dzone, flags);
+		dm_zoned_trigger_reclaim(dzt);
+		goto out;
+	}
+
+	/* Otherwise, get a free buffer zone */
+	bzone = list_first_entry(&dzt->bz_free_list,
+				 struct dm_zoned_zone, link);
+	list_del_init(&bzone->link);
+	list_add_tail(&bzone->link, &dzt->bz_lru_list);
+	atomic_dec(&dzt->bz_nr_free);
+	dm_zoned_schedule_reclaim(dzt, DM_ZONED_RECLAIM_PERIOD);
+
+	/* Assign the buffer zone to the data zone */
+	bzone->bzone = dzone;
+	dm_zoned_set_bzone_mapping(dzt, bzone, dzone->id);
+
+	dzone->bzone = bzone;
+	smp_mb__before_atomic();
+	set_bit(DM_ZONE_BUFFERED, &dzone->flags);
+	smp_mb__after_atomic();
+
+	dm_zoned_dev_debug(dzt, "Buffer zone %lu assigned to zone %lu\n",
+			   bzone->id, dzone->id);
+
+out:
+
+	dm_zoned_map_unlock(dzt);
+
+	return bzone;
+}
+
+/**
+ * Wake up buffer zone waiter.
+ */
+static void
+dm_zoned_wake_bzone_waiter(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zwork *zwork;
+	struct dm_zoned_zone *dzone;
+	unsigned long flags;
+
+	if (list_empty(&dzt->bz_wait_list))
+		return;
+
+	/* Wake up the first buffer waiting zone */
+	zwork = list_first_entry(&dzt->bz_wait_list,
+				 struct dm_zoned_zwork, link);
+	list_del_init(&zwork->link);
+	dzone = zwork->dzone;
+	dm_zoned_lock_zone(dzone, flags);
+	clear_bit_unlock(DM_ZONE_ACTIVE_WAIT, &dzone->flags);
+	atomic_dec(&dzt->dz_nr_active_wait);
+	smp_mb__after_atomic();
+	dm_zoned_run_dzone(dzt, dzone);
+	dm_zoned_unlock_zone(dzone, flags);
+}
+
+/**
+ * Unmap and free the buffer zone of a data zone.
+ */
+void
+dm_zoned_free_bzone(struct dm_zoned_target *dzt,
+		    struct dm_zoned_zone *bzone)
+{
+	struct dm_zoned_zone *dzone = bzone->bzone;
+
+	dm_zoned_map_lock(dzt);
+
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_buf(bzone));
+	dm_zoned_dev_assert(dzt, dzone);
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(dzone));
+
+	/* Return the buffer zone into the free list */
+	smp_mb__before_atomic();
+	clear_bit(DM_ZONE_DIRTY, &bzone->flags);
+	clear_bit(DM_ZONE_BUFFERED, &dzone->flags);
+	smp_mb__after_atomic();
+
+	bzone->bzone = NULL;
+
+	dzone->bzone = NULL;
+	dzone->wr_buf_blocks = 0;
+
+	list_del_init(&bzone->link);
+	list_add_tail(&bzone->link, &dzt->bz_free_list);
+	atomic_inc(&dzt->bz_nr_free);
+	dm_zoned_set_bzone_mapping(dzt, bzone, DM_ZONED_MAP_UNMAPPED);
+	dm_zoned_wake_bzone_waiter(dzt);
+
+	dm_zoned_dev_debug(dzt, "Freed buffer zone %lu\n", bzone->id);
+
+	dm_zoned_map_unlock(dzt);
+}
+
+/**
+ * After a write or a discard, the buffer zone of
+ * a data zone may become entirely invalid and can be freed.
+ * Check this here.
+ */
+void
+dm_zoned_validate_bzone(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *dzone)
+{
+	struct dm_zoned_zone *bzone = dzone->bzone;
+
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(dzone));
+	dm_zoned_dev_assert(dzt, test_bit(DM_ZONE_ACTIVE, &dzone->flags));
+
+	if (!bzone || !test_and_clear_bit(DM_ZONE_DIRTY, &bzone->flags))
+		return;
+
+	/* If all blocks are invalid, free it */
+	if (dm_zoned_zone_weight(dzt, bzone) == 0) {
+		dm_zoned_free_bzone(dzt, bzone);
+		return;
+	}
+
+	/* LRU update the list of buffered data zones */
+	dm_zoned_map_lock(dzt);
+	list_del_init(&bzone->link);
+	list_add_tail(&bzone->link, &dzt->bz_lru_list);
+	dm_zoned_map_unlock(dzt);
+}
+
+/**
+ * Format data zone mapping.
+ */
+static int
+dm_zoned_format_dzone_mapping(struct dm_zoned_target *dzt)
+{
+	struct buffer_head *map_bh;
+	unsigned int *map;
+	int i, j;
+
+	/* Zero fill the data zone mapping table */
+	for (i = 0; i < dzt->nr_map_blocks; i++) {
+		map_bh = dm_zoned_get_meta(dzt, i + 1);
+		if (IS_ERR(map_bh))
+			return PTR_ERR(map_bh);
+		map = (unsigned int *) map_bh->b_data;
+		lock_buffer(map_bh);
+		for (j = 0; j < DM_ZONED_MAP_ENTRIES_PER_BLOCK; j++)
+			map[j] = cpu_to_le32(DM_ZONED_MAP_UNMAPPED);
+		dm_zoned_dirty_meta(dzt, map_bh);
+		unlock_buffer(map_bh);
+		__brelse(map_bh);
+	}
+
+	return 0;
+}
+
+/**
+ * Cleanup resources used for the data zone mapping table.
+ */
+static void
+dm_zoned_cleanup_dzone_mapping(struct dm_zoned_target *dzt)
+{
+	int i;
+
+	/* Cleanup zone mapping resources */
+	if (!dzt->dz_map_bh)
+		return;
+
+	for (i = 0; i < dzt->nr_map_blocks; i++)
+		brelse(dzt->dz_map_bh[i]);
+
+	kfree(dzt->dz_map_bh);
+}
+
+/**
+ * Initialize data zone mapping.
+ */
+static int
+dm_zoned_load_dzone_mapping(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *zone;
+	struct buffer_head *map_bh;
+	unsigned int *map;
+	unsigned long dzone_id;
+	int i, j, chunk = 0;
+	int ret = 0;
+
+	/* Data zone mapping table blocks array */
+	dzt->dz_map_bh = kzalloc(sizeof(struct buffer_head *) *
+				 dzt->nr_map_blocks, GFP_KERNEL);
+	if (!dzt->dz_map_bh)
+		return -ENOMEM;
+	dm_zoned_account_mem(dzt, sizeof(struct buffer_head *) *
+			     dzt->nr_map_blocks);
+
+	/* Get data zone mapping blocks and initialize zone mapping */
+	for (i = 0; i < dzt->nr_map_blocks; i++) {
+
+		/* Get mapping block */
+		map_bh = dm_zoned_get_meta(dzt, i + 1);
+		if (IS_ERR(map_bh)) {
+			ret = PTR_ERR(map_bh);
+			goto out;
+		}
+		dzt->dz_map_bh[i] = map_bh;
+		dm_zoned_account_mem(dzt, DM_ZONED_BLOCK_SIZE);
+
+		/* Process entries */
+		map = (unsigned int *) map_bh->b_data;
+		j = 0;
+		for (j = 0; j < DM_ZONED_MAP_ENTRIES_PER_BLOCK &&
+			    chunk < dzt->nr_data_zones; j++) {
+			dzone_id = le32_to_cpu(map[j]);
+			if (dzone_id != DM_ZONED_MAP_UNMAPPED) {
+				zone = dm_zoned_lookup_zone_by_id(dzt,
+								  dzone_id);
+				if (!zone) {
+					dm_zoned_dev_error(dzt,
+						"Mapping entry %d: zone %lu not found\n",
+						chunk, dzone_id);
+					map[j] = DM_ZONED_MAP_UNMAPPED;
+					dm_zoned_dirty_meta(dzt, map_bh);
+				} else {
+					zone->map = chunk;
+					dzt->dz_nr_unmap--;
+					list_del_init(&zone->link);
+					if (dm_zoned_zone_is_cmr(zone))
+						list_add_tail(&zone->link,
+							&dzt->dz_map_cmr_list);
+				}
+			}
+			chunk++;
+		}
+
+	}
+
+out:
+	if (ret)
+		dm_zoned_cleanup_dzone_mapping(dzt);
+
+	return ret;
+}
+
+/**
+ * Set the data zone mapping entry for a chunk of the logical disk.
+ */
+static void
+dm_zoned_set_dzone_mapping(struct dm_zoned_target *dzt,
+			   unsigned int chunk,
+			   unsigned int dzone_id)
+{
+	struct buffer_head *map_bh =
+		dzt->dz_map_bh[chunk >> DM_ZONED_MAP_ENTRIES_SHIFT];
+	unsigned int *map = (unsigned int *) map_bh->b_data;
+
+	lock_buffer(map_bh);
+	map[chunk & DM_ZONED_MAP_ENTRIES_MASK] = cpu_to_le32(dzone_id);
+	dm_zoned_dirty_meta(dzt, map_bh);
+	unlock_buffer(map_bh);
+}
+
+/**
+ * Get the data zone mapping of a chunk of the logical disk.
+ */
+static unsigned int
+dm_zoned_get_dzone_mapping(struct dm_zoned_target *dzt,
+			   unsigned int chunk)
+{
+	struct buffer_head *map_bh =
+		dzt->dz_map_bh[chunk >> DM_ZONED_MAP_ENTRIES_SHIFT];
+	unsigned int *map = (unsigned int *) map_bh->b_data;
+
+	return le32_to_cpu(map[chunk & DM_ZONED_MAP_ENTRIES_MASK]);
+}
+
+/**
+ * Get an unmapped data zone and map it to chunk.
+ * This must be called with the mapping lock held.
+ */
+struct dm_zoned_zone *
+dm_zoned_alloc_dzone(struct dm_zoned_target *dzt,
+		     unsigned int chunk,
+		     unsigned int type_hint)
+{
+	struct dm_zoned_zone *dzone = NULL;
+
+again:
+
+	/* Get an unmapped data zone: if asked to, try to get */
+	/* an unmapped randomly writtable zone. Otherwise,    */
+	/* get a sequential zone.                             */
+	switch (type_hint) {
+	case DM_DZONE_CMR:
+		dzone = list_first_entry_or_null(&dzt->dz_unmap_cmr_list,
+						 struct dm_zoned_zone, link);
+		if (dzone)
+			break;
+	case DM_DZONE_SMR:
+	default:
+		dzone = list_first_entry_or_null(&dzt->dz_unmap_smr_list,
+						 struct dm_zoned_zone, link);
+		if (dzone)
+			break;
+		dzone = list_first_entry_or_null(&dzt->dz_unmap_cmr_list,
+						 struct dm_zoned_zone, link);
+		break;
+	}
+
+	if (dzone) {
+		list_del_init(&dzone->link);
+		dzt->dz_nr_unmap--;
+		if (dm_zoned_zone_offline(dzone)) {
+			dm_zoned_dev_error(dzt, "Ignoring offline dzone %lu\n",
+					   dzone->id);
+			goto again;
+		}
+
+		dm_zoned_dev_debug(dzt, "Allocated %s dzone %lu\n",
+				 dm_zoned_zone_is_cmr(dzone) ? "CMR" : "SMR",
+				 dzone->id);
+
+		/* Set the zone chunk mapping */
+		if (chunk != DM_ZONED_MAP_UNMAPPED) {
+			dm_zoned_set_dzone_mapping(dzt, chunk, dzone->id);
+			dzone->map = chunk;
+			if (dm_zoned_zone_is_cmr(dzone))
+				list_add_tail(&dzone->link,
+					      &dzt->dz_map_cmr_list);
+		}
+
+	}
+
+	return dzone;
+}
+
+/**
+ * Unmap and free a chunk data zone.
+ * This must be called with the mapping lock held.
+ */
+void
+dm_zoned_free_dzone(struct dm_zoned_target *dzt,
+		    struct dm_zoned_zone *dzone)
+{
+
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(dzone));
+	dm_zoned_dev_assert(dzt, !test_bit(DM_ZONE_BUFFERED, &dzone->flags));
+
+	/* Reset the zone */
+	dm_zoned_wait_for_stable_zone(dzone);
+	dm_zoned_reset_zone_wp(dzt, dzone);
+	dm_zoned_zone_reset_stats(dzone);
+
+	dm_zoned_map_lock(dzt);
+
+	/* Clear the zone chunk mapping */
+	if (dzone->map != DM_ZONED_MAP_UNMAPPED) {
+		dm_zoned_set_dzone_mapping(dzt, dzone->map,
+					   DM_ZONED_MAP_UNMAPPED);
+		dzone->map = DM_ZONED_MAP_UNMAPPED;
+	}
+
+	/* If the zone was already marked as empty after */
+	/* a discard, remove it from the empty list.     */
+	if (test_and_clear_bit(DM_ZONE_EMPTY, &dzone->flags))
+		list_del_init(&dzone->elink);
+
+	/* Return the zone to the unmap list */
+	smp_mb__before_atomic();
+	clear_bit(DM_ZONE_DIRTY, &dzone->flags);
+	smp_mb__after_atomic();
+	if (dm_zoned_zone_is_cmr(dzone)) {
+		list_del_init(&dzone->link);
+		list_add_tail(&dzone->link, &dzt->dz_unmap_cmr_list);
+	} else
+		list_add_tail(&dzone->link, &dzt->dz_unmap_smr_list);
+	dzt->dz_nr_unmap++;
+
+	dm_zoned_dev_debug(dzt, "Freed data zone %lu\n", dzone->id);
+
+	dm_zoned_map_unlock(dzt);
+}
+
+/**
+ * After a failed write or a discard, a data zone may become
+ * entirely invalid and can be freed. Check this here.
+ */
+void
+dm_zoned_validate_dzone(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *dzone)
+{
+	int dweight;
+
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(dzone));
+
+	if (dzone->bzone ||
+	    !test_and_clear_bit(DM_ZONE_DIRTY, &dzone->flags))
+		return;
+
+	dweight = dm_zoned_zone_weight(dzt, dzone);
+	dm_zoned_map_lock(dzt);
+	if (dweight == 0 &&
+	    !test_and_set_bit_lock(DM_ZONE_EMPTY, &dzone->flags)) {
+		list_add_tail(&dzone->elink, &dzt->dz_empty_list);
+		dm_zoned_schedule_reclaim(dzt, DM_ZONED_RECLAIM_PERIOD);
+	}
+	dm_zoned_map_unlock(dzt);
+}
+
+/**
+ * Change the mapping of the chunk served by @from_dzone
+ * to @to_dzone (used by GC). This implies that @from_dzone
+ * is invalidated, unmapped and freed.
+ */
+void
+dm_zoned_remap_dzone(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *from_dzone,
+		     struct dm_zoned_zone *to_dzone)
+{
+	unsigned int chunk = from_dzone->map;
+
+	dm_zoned_map_lock(dzt);
+
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(from_dzone));
+	dm_zoned_dev_assert(dzt, chunk != DM_ZONED_MAP_UNMAPPED);
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(to_dzone));
+	dm_zoned_dev_assert(dzt, to_dzone->map == DM_ZONED_MAP_UNMAPPED);
+
+	from_dzone->map = DM_ZONED_MAP_UNMAPPED;
+	if (dm_zoned_zone_is_cmr(from_dzone))
+		list_del_init(&from_dzone->link);
+
+	dm_zoned_set_dzone_mapping(dzt, chunk, to_dzone->id);
+	to_dzone->map = chunk;
+	if (dm_zoned_zone_is_cmr(to_dzone))
+		list_add_tail(&to_dzone->link, &dzt->dz_map_cmr_list);
+
+	dm_zoned_map_unlock(dzt);
+}
+
+/**
+ * Change the type of @bzone to data zone and map it
+ * to the chunk being mapped by its current data zone.
+ * In the buffer zone mapping table, replace @bzone
+ * with @new_bzone.
+ */
+void
+dm_zoned_remap_bzone(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *bzone,
+		     struct dm_zoned_zone *new_bzone)
+{
+	struct dm_zoned_zone *dzone = bzone->bzone;
+	unsigned int chunk = dzone->map;
+
+	dm_zoned_map_lock(dzt);
+
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_buf(bzone));
+	dm_zoned_dev_assert(dzt, dm_zoned_zone_data(new_bzone));
+	dm_zoned_dev_assert(dzt, chunk != DM_ZONED_MAP_UNMAPPED);
+	dm_zoned_dev_assert(dzt, new_bzone->map == DM_ZONED_MAP_UNMAPPED);
+
+	/* Cleanup dzone */
+	smp_mb__before_atomic();
+	clear_bit(DM_ZONE_BUFFERED, &dzone->flags);
+	smp_mb__after_atomic();
+	dzone->bzone = NULL;
+	dzone->map = DM_ZONED_MAP_UNMAPPED;
+
+	/* new_bzone becomes a free buffer zone */
+	new_bzone->flags = DM_ZONE_BUF;
+	smp_mb__before_atomic();
+	set_bit(DM_ZONE_RECLAIM, &new_bzone->flags);
+	smp_mb__after_atomic();
+	dm_zoned_change_bzone_mapping(dzt, bzone, new_bzone,
+				    DM_ZONED_MAP_UNMAPPED);
+	list_add_tail(&new_bzone->link, &dzt->bz_free_list);
+	atomic_inc(&dzt->bz_nr_free);
+	dm_zoned_wake_bzone_waiter(dzt);
+
+	/* bzone becomes a mapped data zone */
+	bzone->bzone = NULL;
+	list_del_init(&bzone->link);
+	bzone->flags = DM_ZONE_DATA;
+	smp_mb__before_atomic();
+	set_bit(DM_ZONE_DIRTY, &bzone->flags);
+	set_bit(DM_ZONE_RECLAIM, &bzone->flags);
+	smp_mb__after_atomic();
+	bzone->map = chunk;
+	dm_zoned_set_dzone_mapping(dzt, chunk, bzone->id);
+	list_add_tail(&bzone->link, &dzt->dz_map_cmr_list);
+
+	dm_zoned_map_unlock(dzt);
+}
+
+/**
+ * Get the data zone mapping the chunk of the BIO.
+ * There may be no mapping.
+ */
+struct dm_zoned_zone *
+dm_zoned_bio_map(struct dm_zoned_target *dzt,
+		 struct bio *bio)
+{
+	struct dm_zoned_bioctx *bioctx =
+		dm_per_bio_data(bio, sizeof(struct dm_zoned_bioctx));
+	struct dm_zoned_zwork *zwork;
+	struct dm_zoned_zone *dzone;
+	unsigned long flags;
+	unsigned int dzone_id;
+	unsigned int chunk;
+
+	/* Get a work to activate the mapping zone if needed. */
+	zwork = kmalloc(sizeof(struct dm_zoned_zwork), GFP_KERNEL);
+	if (unlikely(!zwork))
+		return ERR_PTR(-ENOMEM);
+
+again:
+	dzone = NULL;
+	dm_zoned_map_lock(dzt);
+
+	chunk = bio->bi_iter.bi_sector >> dzt->zone_nr_sectors_shift;
+	dzone_id = dm_zoned_get_dzone_mapping(dzt, chunk);
+
+	/* For write to unmapped chunks, try */
+	/* to allocate an unused data zone.  */
+	if (dzone_id != DM_ZONED_MAP_UNMAPPED)
+		dzone = dm_zoned_lookup_zone_by_id(dzt, dzone_id);
+	else if ((bio->bi_rw & REQ_WRITE) &&
+		 (!(bio->bi_rw & REQ_DISCARD)))
+		dzone = dm_zoned_alloc_dzone(dzt, chunk, DM_DZONE_ANY);
+
+	if (!dzone)
+		/* No mapping: no work needed */
+		goto out;
+
+	dm_zoned_lock_zone(dzone, flags);
+
+	/* If the zone buffer is being reclaimed, wait */
+	if (test_bit(DM_ZONE_RECLAIM, &dzone->flags)) {
+		dm_zoned_dev_debug(dzt, "Wait for zone %lu reclaim (%lx)\n",
+				 dzone->id,
+				 dzone->flags);
+		dm_zoned_unlock_zone(dzone, flags);
+		dm_zoned_map_unlock(dzt);
+		wait_on_bit_io(&dzone->flags, DM_ZONE_RECLAIM,
+			       TASK_UNINTERRUPTIBLE);
+		goto again;
+	}
+
+	if (test_and_clear_bit(DM_ZONE_EMPTY, &dzone->flags))
+		list_del_init(&dzone->elink);
+
+	/* Got the mapping zone: set it active */
+	if (!test_and_set_bit(DM_ZONE_ACTIVE, &dzone->flags)) {
+		INIT_WORK(&zwork->work, dm_zoned_zone_work);
+		zwork->target = dzt;
+		zwork->dzone = dzone;
+		INIT_LIST_HEAD(&zwork->link);
+		atomic_set(&zwork->ref, 0);
+		bio_list_init(&zwork->bio_list);
+		atomic_set(&zwork->bio_count, 0);
+		dzone->zwork = zwork;
+		atomic_inc(&dzt->dz_nr_active);
+	} else {
+		kfree(zwork);
+		zwork = dzone->zwork;
+		dm_zoned_dev_assert(dzt, zwork);
+	}
+
+	bioctx->dzone = dzone;
+	atomic_inc(&zwork->ref);
+	bio_list_add(&zwork->bio_list, bio);
+
+	dm_zoned_run_dzone(dzt, dzone);
+	zwork = NULL;
+
+	dm_zoned_unlock_zone(dzone, flags);
+
+out:
+	dm_zoned_map_unlock(dzt);
+
+	if (zwork)
+		kfree(zwork);
+
+	return dzone;
+}
+
+/**
+ * If needed and possible, queue an active zone work.
+ */
+void
+dm_zoned_run_dzone(struct dm_zoned_target *dzt,
+		   struct dm_zoned_zone *dzone)
+{
+	struct dm_zoned_zwork *zwork = dzone->zwork;
+
+	dm_zoned_dev_assert(dzt, test_bit(DM_ZONE_ACTIVE, &dzone->flags));
+	dm_zoned_dev_assert(dzt, zwork != NULL);
+	dm_zoned_dev_assert(dzt, atomic_read(&zwork->ref) > 0);
+
+	if (bio_list_peek(&zwork->bio_list) &&
+	    !test_bit(DM_ZONE_ACTIVE_WAIT, &dzone->flags)) {
+		if (queue_work(dzt->zone_wq, &zwork->work))
+			atomic_inc(&zwork->ref);
+	}
+}
+
+/**
+ * Release an active data zone: the last put will
+ * deactivate the zone and free its work struct.
+ */
+void
+dm_zoned_put_dzone(struct dm_zoned_target *dzt,
+		   struct dm_zoned_zone *dzone)
+{
+	struct dm_zoned_zwork *zwork = dzone->zwork;
+	unsigned long flags;
+
+	dm_zoned_dev_assert(dzt, test_bit(DM_ZONE_ACTIVE, &dzone->flags));
+	dm_zoned_dev_assert(dzt, zwork != NULL);
+	dm_zoned_dev_assert(dzt, atomic_read(&zwork->ref) > 0);
+
+	dm_zoned_lock_zone(dzone, flags);
+
+	if (atomic_dec_and_test(&zwork->ref)) {
+		kfree(zwork);
+		dzone->zwork = NULL;
+		clear_bit_unlock(DM_ZONE_ACTIVE, &dzone->flags);
+		smp_mb__after_atomic();
+		atomic_dec(&dzt->dz_nr_active);
+		wake_up_bit(&dzone->flags, DM_ZONE_ACTIVE);
+	}
+
+	dm_zoned_unlock_zone(dzone, flags);
+}
+
+/**
+ * Determine metadata format.
+ */
+static int
+dm_zoned_format(struct dm_zoned_target *dzt,
+		struct dm_zoned_target_config *conf)
+{
+	unsigned int nr_meta_blocks, nr_meta_zones = 1;
+	unsigned int nr_buf_zones, nr_data_zones;
+	unsigned int nr_bitmap_blocks, nr_map_blocks;
+
+	dm_zoned_dev_info(dzt, "Formatting device with %lu buffer zones\n",
+			conf->nr_buf_zones);
+
+	if (conf->nr_buf_zones < DM_ZONED_NR_BZONES_MIN) {
+		conf->nr_buf_zones = DM_ZONED_NR_BZONES_MIN;
+		dm_zoned_dev_info(dzt,
+			"    Number of buffer zones too low: using %lu\n",
+			conf->nr_buf_zones);
+	}
+
+	if (conf->nr_buf_zones > DM_ZONED_NR_BZONES_MAX) {
+		conf->nr_buf_zones = DM_ZONED_NR_BZONES_MAX;
+		dm_zoned_dev_info(dzt,
+			"    Number of buffer zones too large: using %lu\n",
+			conf->nr_buf_zones);
+	}
+
+	nr_buf_zones = conf->nr_buf_zones;
+
+again:
+
+	nr_data_zones = dzt->nr_zones - nr_buf_zones - nr_meta_zones;
+	nr_map_blocks = nr_data_zones >> DM_ZONED_MAP_ENTRIES_SHIFT;
+	if (nr_data_zones & DM_ZONED_MAP_ENTRIES_MASK)
+		nr_map_blocks++;
+	nr_bitmap_blocks = (dzt->nr_zones - nr_meta_zones) *
+		dzt->zone_nr_bitmap_blocks;
+	nr_meta_blocks = 1 + nr_map_blocks + nr_bitmap_blocks;
+	nr_meta_zones = (nr_meta_blocks + dzt->zone_nr_blocks_mask) >>
+		dzt->zone_nr_blocks_shift;
+
+	if (nr_meta_zones > dzt->nr_meta_zones) {
+		dm_zoned_dev_error(dzt,
+			"Insufficient random write space for metadata (need %u zones, have %u)\n",
+			nr_meta_zones, dzt->nr_meta_zones);
+		return -ENXIO;
+	}
+
+	if ((nr_meta_zones + nr_buf_zones) > dzt->nr_rnd_zones) {
+		nr_buf_zones = dzt->nr_rnd_zones - nr_meta_zones;
+		dm_zoned_dev_info(dzt,
+			"Insufficient random zones: retrying with %u buffer zones\n",
+			nr_buf_zones);
+		goto again;
+	}
+
+	/* Fixup everything */
+	dzt->nr_meta_zones = nr_meta_zones;
+	dzt->nr_buf_zones = nr_buf_zones;
+	dzt->nr_data_zones = dzt->nr_zones - nr_buf_zones - nr_meta_zones;
+	dzt->nr_map_blocks = dzt->nr_data_zones >> DM_ZONED_MAP_ENTRIES_SHIFT;
+	if (dzt->nr_data_zones & DM_ZONED_MAP_ENTRIES_MASK)
+		dzt->nr_map_blocks++;
+	dzt->nr_bitmap_blocks = (dzt->nr_buf_zones + dzt->nr_data_zones) *
+		dzt->zone_nr_bitmap_blocks;
+	dzt->bitmap_block = 1 + dzt->nr_map_blocks;
+
+	return 0;
+}
+
+/**
+ * Format the target device metadata.
+ */
+static int
+dm_zoned_format_meta(struct dm_zoned_target *dzt,
+		     struct dm_zoned_target_config *conf)
+{
+	struct dm_zoned_super *sb;
+	int b, ret;
+
+	/* Reset all zones */
+	ret = dm_zoned_reset_zones(dzt);
+	if (ret)
+		return ret;
+
+	/* Initialize the super block data */
+	ret = dm_zoned_format(dzt, conf);
+	if (ret)
+		return ret;
+
+	/* Format buffer zones mapping */
+	ret = dm_zoned_format_bzone_mapping(dzt);
+	if (ret)
+		return ret;
+
+	/* Format data zones mapping */
+	ret = dm_zoned_format_dzone_mapping(dzt);
+	if (ret)
+		return ret;
+
+	/* Clear bitmaps */
+	for (b = 0; b < dzt->nr_bitmap_blocks; b++) {
+		ret = dm_zoned_zero_meta(dzt, dzt->bitmap_block + b);
+		if (ret)
+			return ret;
+	}
+
+	/* Finally, write super block */
+	sb = (struct dm_zoned_super *) dzt->sb_bh->b_data;
+	lock_buffer(dzt->sb_bh);
+	sb->magic = cpu_to_le32(DM_ZONED_MAGIC);
+	sb->version = cpu_to_le32(DM_ZONED_META_VER);
+	sb->nr_map_blocks = cpu_to_le32(dzt->nr_map_blocks);
+	sb->nr_bitmap_blocks = cpu_to_le32(dzt->nr_bitmap_blocks);
+	sb->nr_buf_zones = cpu_to_le32(dzt->nr_buf_zones);
+	sb->nr_data_zones = cpu_to_le32(dzt->nr_data_zones);
+	dm_zoned_dirty_meta(dzt, dzt->sb_bh);
+	unlock_buffer(dzt->sb_bh);
+
+	return dm_zoned_flush(dzt);
+}
+
+/**
+ * Count zones in a list.
+ */
+static int
+dm_zoned_zone_count(struct list_head *list)
+{
+	struct dm_zoned_zone *zone;
+	int n = 0;
+
+	list_for_each_entry(zone, list, link) {
+		n++;
+	}
+
+	return n;
+}
+
+/**
+ * Shuffle data zone list: file systems tend to distribute
+ * accesses accross a disk to achieve stable performance
+ * over time. Allocating and mapping these spread accessing
+ * to contiguous data zones in LBA order would achieve the
+ * opposite result (fast accesses initially, slower later).
+ * So make sure this does not happen by shuffling the initially
+ * LBA ordered list of SMR data zones.
+ * Shuffling: LBA ordered zone list 0,1,2,3,4,5,6,7 [...] is
+ * reorganized as: 0,4,1,5,2,6,3,7 [...]
+ */
+static void
+dm_zoned_shuffle_dzones(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *dzone;
+	struct list_head tmp1;
+	struct list_head tmp2;
+	int n = 0;
+
+	INIT_LIST_HEAD(&tmp1);
+	INIT_LIST_HEAD(&tmp2);
+
+	while (!list_empty(&dzt->dz_unmap_smr_list) &&
+	      n < dzt->nr_smr_data_zones / 2) {
+		dzone = list_first_entry(&dzt->dz_unmap_smr_list,
+					 struct dm_zoned_zone, link);
+		list_del_init(&dzone->link);
+		list_add_tail(&dzone->link, &tmp1);
+		n++;
+	}
+	while (!list_empty(&dzt->dz_unmap_smr_list)) {
+		dzone = list_first_entry(&dzt->dz_unmap_smr_list,
+					 struct dm_zoned_zone, link);
+		list_del_init(&dzone->link);
+		list_add_tail(&dzone->link, &tmp2);
+	}
+	while (!list_empty(&tmp1) && !list_empty(&tmp2)) {
+		dzone = list_first_entry_or_null(&tmp1,
+						 struct dm_zoned_zone, link);
+		if (dzone) {
+			list_del_init(&dzone->link);
+			list_add_tail(&dzone->link, &dzt->dz_unmap_smr_list);
+		}
+		dzone = list_first_entry_or_null(&tmp2,
+						 struct dm_zoned_zone, link);
+		if (dzone) {
+			list_del_init(&dzone->link);
+			list_add_tail(&dzone->link, &dzt->dz_unmap_smr_list);
+		}
+	}
+}
+
+/**
+ * Load meta data from disk.
+ */
+static int
+dm_zoned_load_meta(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_super *sb =
+		(struct dm_zoned_super *) dzt->sb_bh->b_data;
+	struct dm_zoned_zone *zone;
+	int i, ret;
+
+	/* Check super block */
+	if (le32_to_cpu(sb->magic) != DM_ZONED_MAGIC) {
+		dm_zoned_dev_error(dzt, "Invalid meta magic "
+				   "(need 0x%08x, got 0x%08x)\n",
+				   DM_ZONED_MAGIC, le32_to_cpu(sb->magic));
+		return -ENXIO;
+	}
+	if (le32_to_cpu(sb->version) != DM_ZONED_META_VER) {
+		dm_zoned_dev_error(dzt, "Invalid meta version "
+				   "(need %d, got %d)\n",
+				   DM_ZONED_META_VER, le32_to_cpu(sb->version));
+		return -ENXIO;
+	}
+
+	dzt->nr_buf_zones = le32_to_cpu(sb->nr_buf_zones);
+	dzt->nr_data_zones = le32_to_cpu(sb->nr_data_zones);
+	if ((dzt->nr_buf_zones + dzt->nr_data_zones) > dzt->nr_zones) {
+		dm_zoned_dev_error(dzt, "Invalid format: %u buffer zones "
+				   "+ %u data zones > %u zones\n",
+				   dzt->nr_buf_zones,
+				   dzt->nr_data_zones,
+				   dzt->nr_zones);
+		return -ENXIO;
+	}
+	dzt->nr_meta_zones = dzt->nr_zones -
+		(dzt->nr_buf_zones + dzt->nr_data_zones);
+	dzt->nr_map_blocks = le32_to_cpu(sb->nr_map_blocks);
+	dzt->nr_bitmap_blocks = le32_to_cpu(sb->nr_bitmap_blocks);
+	dzt->nr_data_zones = le32_to_cpu(sb->nr_data_zones);
+	dzt->bitmap_block = dzt->nr_map_blocks + 1;
+	dzt->dz_nr_unmap = dzt->nr_data_zones;
+
+	/* Load the buffer zones mapping table */
+	ret = dm_zoned_load_bzone_mapping(dzt);
+	if (ret) {
+		dm_zoned_dev_error(dzt, "Load buffer zone mapping failed %d\n",
+				 ret);
+		return ret;
+	}
+
+	/* Load the data zone mapping table */
+	ret = dm_zoned_load_dzone_mapping(dzt);
+	if (ret) {
+		dm_zoned_dev_error(dzt, "Load data zone mapping failed %d\n",
+				 ret);
+		return ret;
+	}
+
+	/* The first nr_meta_zones are still marked */
+	/* as unmapped data zones: fix this         */
+	for (i = 0; i < dzt->nr_meta_zones; i++) {
+		zone = dm_zoned_lookup_zone_by_id(dzt, i);
+		if (!zone) {
+			dm_zoned_dev_error(dzt, "Meta zone %d not found\n", i);
+			return -ENXIO;
+		}
+		zone->flags = DM_ZONE_META;
+		list_del_init(&zone->link);
+	}
+	dzt->nr_cmr_data_zones = dm_zoned_zone_count(&dzt->dz_map_cmr_list) +
+		dm_zoned_zone_count(&dzt->dz_unmap_cmr_list);
+	dzt->nr_smr_data_zones = dzt->nr_data_zones - dzt->nr_cmr_data_zones;
+
+	dm_zoned_shuffle_dzones(dzt);
+
+	dm_zoned_dev_info(dzt, "Backend device:\n");
+	dm_zoned_dev_info(dzt,
+		"    %zu 512-byte logical sectors\n",
+		(sector_t)dzt->nr_zones << dzt->zone_nr_sectors_shift);
+	dm_zoned_dev_info(dzt,
+		"    %u zones of %zu 512-byte logical sectors\n",
+		dzt->nr_zones, dzt->zone_nr_sectors);
+	dm_zoned_dev_info(dzt,
+		"    %u CMR zones, %u SMR zones (%u random write zones)\n",
+		dzt->nr_cmr_zones,
+		dzt->nr_smr_zones,
+		dzt->nr_rnd_zones);
+	dm_zoned_dev_info(dzt,
+		"    %u metadata zones\n", dzt->nr_meta_zones);
+	dm_zoned_dev_info(dzt,
+		"    %u buffer zones (%d free zones, %u low threshold)\n",
+		dzt->nr_buf_zones,  atomic_read(&dzt->bz_nr_free),
+		dzt->bz_nr_free_low);
+	dm_zoned_dev_info(dzt,
+		"    %u data zones (%u SMR zones, %u CMR zones), %u unmapped zones\n",
+		dzt->nr_data_zones, dzt->nr_smr_data_zones,
+		dzt->nr_cmr_data_zones, dzt->dz_nr_unmap);
+
+#ifdef __DM_ZONED_DEBUG
+	dm_zoned_dev_info(dzt, "Format:\n");
+	dm_zoned_dev_info(dzt,
+		"        %u data zone mapping blocks from block 1\n",
+		dzt->nr_map_blocks);
+	dm_zoned_dev_info(dzt,
+		"        %u bitmap blocks from block %zu (%u blocks per zone)\n",
+		dzt->nr_bitmap_blocks, dzt->bitmap_block,
+		dzt->zone_nr_bitmap_blocks);
+	dm_zoned_dev_info(dzt,
+		"Using %zu KiB of memory\n", dzt->used_mem >> 10);
+#endif
+
+	return 0;
+}
+
+/**
+ * Initialize the target metadata.
+ */
+int
+dm_zoned_init_meta(struct dm_zoned_target *dzt,
+		   struct dm_zoned_target_config *conf)
+{
+	int ret;
+
+	/* Flush the target device */
+	blkdev_issue_flush(dzt->zbd, GFP_NOFS, NULL);
+
+	/* Initialize zone descriptors */
+	ret = dm_zoned_init_zones(dzt);
+	if (ret)
+		goto out;
+
+	/* Get super block */
+	dzt->sb_bh = dm_zoned_get_meta(dzt, 0);
+	if (IS_ERR(dzt->sb_bh)) {
+		ret = PTR_ERR(dzt->sb_bh);
+		dzt->sb_bh = NULL;
+		dm_zoned_dev_error(dzt, "Read super block failed %d\n", ret);
+		goto out;
+	}
+	dm_zoned_account_mem(dzt, DM_ZONED_BLOCK_SIZE);
+
+	/* If asked to reformat */
+	if (conf->format) {
+		ret = dm_zoned_format_meta(dzt, conf);
+		if (ret)
+			goto out;
+	}
+
+	/* Load meta-data */
+	ret = dm_zoned_load_meta(dzt);
+	if (ret)
+		goto out;
+
+out:
+	if (ret)
+		dm_zoned_cleanup_meta(dzt);
+
+	return ret;
+}
+
+/**
+ * Check metadata on resume.
+ */
+int
+dm_zoned_resume_meta(struct dm_zoned_target *dzt)
+{
+	return dm_zoned_check_zones(dzt);
+}
+
+/**
+ * Cleanup the target metadata resources.
+ */
+void
+dm_zoned_cleanup_meta(struct dm_zoned_target *dzt)
+{
+
+	dm_zoned_cleanup_dzone_mapping(dzt);
+	brelse(dzt->sb_bh);
+	dm_zoned_drop_zones(dzt);
+}
+
+/**
+ * Set @nr_bits bits in @bitmap starting from @bit.
+ * Return the number of bits changed from 0 to 1.
+ */
+static unsigned int
+dm_zoned_set_bits(unsigned long *bitmap,
+		  unsigned int bit,
+		  unsigned int nr_bits)
+{
+	unsigned long *addr;
+	unsigned int end = bit + nr_bits;
+	unsigned int n = 0;
+
+	while (bit < end) {
+
+		if (((bit & (BITS_PER_LONG - 1)) == 0) &&
+		    ((end - bit) >= BITS_PER_LONG)) {
+			/* Try to set the whole word at once */
+			addr = bitmap + BIT_WORD(bit);
+			if (*addr == 0) {
+				*addr = ULONG_MAX;
+				n += BITS_PER_LONG;
+				bit += BITS_PER_LONG;
+				continue;
+			}
+		}
+
+		if (!test_and_set_bit(bit, bitmap))
+			n++;
+		bit++;
+	}
+
+	return n;
+
+}
+
+/**
+ * Get the bitmap block storing the bit for @chunk_block
+ * in @zone.
+ */
+static struct buffer_head *
+dm_zoned_get_bitmap(struct dm_zoned_target *dzt,
+		    struct dm_zoned_zone *zone,
+		    sector_t chunk_block)
+{
+	sector_t bitmap_block = dzt->bitmap_block
+		+ ((sector_t)(zone->id - dzt->nr_meta_zones)
+		   * dzt->zone_nr_bitmap_blocks)
+		+ (chunk_block >> DM_ZONED_BLOCK_SHIFT_BITS);
+
+	return dm_zoned_get_meta(dzt, bitmap_block);
+}
+
+/**
+ * Validate (set bit) all the blocks in
+ * the range [@block..@block+@nr_blocks-1].
+ */
+int
+dm_zoned_validate_blocks(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 sector_t chunk_block,
+			 unsigned int nr_blocks)
+{
+	unsigned int count, bit, nr_bits;
+	struct buffer_head *bh;
+
+	dm_zoned_dev_debug(dzt, "=> VALIDATE zone %lu, block %zu, %u blocks\n",
+			 zone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	dm_zoned_dev_assert(dzt, !dm_zoned_zone_meta(zone));
+	dm_zoned_dev_assert(dzt,
+			    (chunk_block + nr_blocks) <= dzt->zone_nr_blocks);
+
+	while (nr_blocks) {
+
+		/* Get bitmap block */
+		bh = dm_zoned_get_bitmap(dzt, zone, chunk_block);
+		if (IS_ERR(bh))
+			return PTR_ERR(bh);
+
+		/* Set bits */
+		bit = chunk_block & DM_ZONED_BLOCK_MASK_BITS;
+		nr_bits = min(nr_blocks, DM_ZONED_BLOCK_SIZE_BITS - bit);
+
+		lock_buffer(bh);
+		count = dm_zoned_set_bits((unsigned long *) bh->b_data,
+					bit, nr_bits);
+		if (count) {
+			dm_zoned_dirty_meta(dzt, bh);
+			set_bit(DM_ZONE_DIRTY, &zone->flags);
+		}
+		unlock_buffer(bh);
+		__brelse(bh);
+
+		nr_blocks -= nr_bits;
+		chunk_block += nr_bits;
+
+	}
+
+	return 0;
+}
+
+/**
+ * Clear @nr_bits bits in @bitmap starting from @bit.
+ * Return the number of bits changed from 1 to 0.
+ */
+static int
+dm_zoned_clear_bits(unsigned long *bitmap,
+		    int bit,
+		    int nr_bits)
+{
+	unsigned long *addr;
+	int end = bit + nr_bits;
+	int n = 0;
+
+	while (bit < end) {
+
+		if (((bit & (BITS_PER_LONG - 1)) == 0) &&
+		    ((end - bit) >= BITS_PER_LONG)) {
+			/* Try to clear whole word at once */
+			addr = bitmap + BIT_WORD(bit);
+			if (*addr == ULONG_MAX) {
+				*addr = 0;
+				n += BITS_PER_LONG;
+				bit += BITS_PER_LONG;
+				continue;
+			}
+		}
+
+		if (test_and_clear_bit(bit, bitmap))
+			n++;
+		bit++;
+	}
+
+	return n;
+
+}
+
+/**
+ * Invalidate (clear bit) all the blocks in
+ * the range [@block..@block+@nr_blocks-1].
+ */
+int
+dm_zoned_invalidate_blocks(struct dm_zoned_target *dzt,
+			   struct dm_zoned_zone *zone,
+			   sector_t chunk_block,
+			   unsigned int nr_blocks)
+{
+	unsigned int count, bit, nr_bits;
+	struct buffer_head *bh;
+
+	dm_zoned_dev_debug(dzt, "INVALIDATE zone %lu, block %zu, %u blocks\n",
+			 zone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	dm_zoned_dev_assert(dzt, !dm_zoned_zone_meta(zone));
+	dm_zoned_dev_assert(dzt,
+			    (chunk_block + nr_blocks) <= dzt->zone_nr_blocks);
+
+	while (nr_blocks) {
+
+		/* Get bitmap block */
+		bh = dm_zoned_get_bitmap(dzt, zone, chunk_block);
+		if (IS_ERR(bh))
+			return PTR_ERR(bh);
+
+		/* Clear bits */
+		bit = chunk_block & DM_ZONED_BLOCK_MASK_BITS;
+		nr_bits = min(nr_blocks, DM_ZONED_BLOCK_SIZE_BITS - bit);
+
+		lock_buffer(bh);
+		count = dm_zoned_clear_bits((unsigned long *) bh->b_data,
+					  bit, nr_bits);
+		if (count) {
+			dm_zoned_dirty_meta(dzt, bh);
+			set_bit(DM_ZONE_DIRTY, &zone->flags);
+		}
+		unlock_buffer(bh);
+		__brelse(bh);
+
+		nr_blocks -= nr_bits;
+		chunk_block += nr_bits;
+
+	}
+
+	return 0;
+}
+
+/**
+ * Get a block bit value.
+ */
+static int
+dm_zoned_test_block(struct dm_zoned_target *dzt,
+		    struct dm_zoned_zone *zone,
+		    sector_t chunk_block)
+{
+	struct buffer_head *bh;
+	int ret;
+
+	/* Get bitmap block */
+	bh = dm_zoned_get_bitmap(dzt, zone, chunk_block);
+	if (IS_ERR(bh))
+		return PTR_ERR(bh);
+
+	/* Get offset */
+	ret = test_bit(chunk_block & DM_ZONED_BLOCK_MASK_BITS,
+		       (unsigned long *) bh->b_data) != 0;
+
+	__brelse(bh);
+
+	return ret;
+}
+
+/**
+ * Return the offset from @block to the first block
+ * with a bit value set to @set. Search at most @nr_blocks
+ * blocks from @block.
+ */
+static int
+dm_zoned_offset_to_block(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 sector_t chunk_block,
+			 unsigned int nr_blocks,
+			 int set)
+{
+	struct buffer_head *bh;
+	unsigned int bit, set_bit, nr_bits;
+	unsigned long *bitmap;
+	int n = 0;
+
+	while (nr_blocks) {
+
+		/* Get bitmap block */
+		bh = dm_zoned_get_bitmap(dzt, zone, chunk_block);
+		if (IS_ERR(bh))
+			return PTR_ERR(bh);
+
+		/* Get offset */
+		bitmap = (unsigned long *) bh->b_data;
+		bit = chunk_block & DM_ZONED_BLOCK_MASK_BITS;
+		nr_bits = min(nr_blocks, DM_ZONED_BLOCK_SIZE_BITS - bit);
+		if (set)
+			set_bit = find_next_bit(bitmap,
+				DM_ZONED_BLOCK_SIZE_BITS, bit);
+		else
+			set_bit = find_next_zero_bit(bitmap,
+				DM_ZONED_BLOCK_SIZE_BITS, bit);
+		__brelse(bh);
+
+		n += set_bit - bit;
+		if (set_bit < DM_ZONED_BLOCK_SIZE_BITS)
+			break;
+
+		nr_blocks -= nr_bits;
+		chunk_block += nr_bits;
+
+	}
+
+	return n;
+}
+
+/**
+ * Test if @block is valid. If it is, the number of consecutive
+ * valid blocks from @block will be returned at the address
+ * indicated by @nr_blocks;
+ */
+int
+dm_zoned_block_valid(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *zone,
+		     sector_t chunk_block)
+{
+	int valid;
+
+	dm_zoned_dev_assert(dzt, !dm_zoned_zone_meta(zone));
+	dm_zoned_dev_assert(dzt, chunk_block < dzt->zone_nr_blocks);
+
+	/* Test block */
+	valid = dm_zoned_test_block(dzt, zone, chunk_block);
+	if (valid <= 0)
+		return valid;
+
+	/* The block is valid: get the number of valid blocks from block */
+	return dm_zoned_offset_to_block(dzt, zone, chunk_block,
+				      dzt->zone_nr_blocks - chunk_block,
+				      0);
+}
+
+/**
+ * Count the number of bits set starting from @bit
+ * up to @bit + @nr_bits - 1.
+ */
+static int
+dm_zoned_count_bits(void *bitmap,
+		    int bit,
+		    int nr_bits)
+{
+	unsigned long *addr;
+	int end = bit + nr_bits;
+	int n = 0;
+
+	while (bit < end) {
+
+		if (((bit & (BITS_PER_LONG - 1)) == 0) &&
+		    ((end - bit) >= BITS_PER_LONG)) {
+			addr = (unsigned long *)bitmap + BIT_WORD(bit);
+			if (*addr == ULONG_MAX) {
+				n += BITS_PER_LONG;
+				bit += BITS_PER_LONG;
+				continue;
+			}
+		}
+
+		if (test_bit(bit, bitmap))
+			n++;
+		bit++;
+	}
+
+	return n;
+
+}
+
+/**
+ * Return the number of valid blocks in the range
+ * of blocks [@block..@block+@nr_blocks-1].
+ */
+int
+dm_zoned_valid_blocks(struct dm_zoned_target *dzt,
+		      struct dm_zoned_zone *zone,
+		      sector_t chunk_block,
+		      unsigned int nr_blocks)
+{
+	struct buffer_head *bh;
+	unsigned int bit, nr_bits;
+	void *bitmap;
+	int n = 0;
+
+	dm_zoned_dev_assert(dzt, !dm_zoned_zone_meta(zone));
+	dm_zoned_dev_assert(dzt,
+			    (chunk_block + nr_blocks) <= dzt->zone_nr_blocks);
+
+	while (nr_blocks) {
+
+		/* Get bitmap block */
+		bh = dm_zoned_get_bitmap(dzt, zone, chunk_block);
+		if (IS_ERR(bh))
+			return PTR_ERR(bh);
+
+		/* Count bits in this block */
+		bitmap = bh->b_data;
+		bit = chunk_block & DM_ZONED_BLOCK_MASK_BITS;
+		nr_bits = min(nr_blocks, DM_ZONED_BLOCK_SIZE_BITS - bit);
+		n += dm_zoned_count_bits(bitmap, bit, nr_bits);
+
+		__brelse(bh);
+
+		nr_blocks -= nr_bits;
+		chunk_block += nr_bits;
+
+	}
+
+	return n;
+}
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
new file mode 100644
index 0000000..3b6cfa5
--- /dev/null
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -0,0 +1,770 @@
+/*
+ * (C) Copyright 2016 Western Digital.
+ *
+ * This software is distributed under the terms of the GNU Lesser General
+ * Public License version 2, or any later version, "as is," without technical
+ * support, and WITHOUT ANY WARRANTY, without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * Author: Damien Le Moal <damien.lemoal@xxxxxxxx>
+ */
+
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/slab.h>
+
+#include "dm-zoned.h"
+
+/**
+ * Free a page list.
+ */
+static void
+dm_zoned_reclaim_free_page_list(struct dm_zoned_target *dzt,
+				struct page_list *pl,
+				unsigned int nr_blocks)
+{
+	unsigned int nr_pages;
+	int i;
+
+	nr_pages = ((nr_blocks << DM_ZONED_BLOCK_SHIFT) +
+		    PAGE_SIZE - 1) >> PAGE_SHIFT;
+	for (i = 0; i < nr_pages; i++) {
+		if (pl[i].page)
+			put_page(pl[i].page);
+	}
+	kfree(pl);
+}
+
+/**
+ * Allocate a page list.
+ */
+static struct page_list *
+dm_zoned_reclaim_alloc_page_list(struct dm_zoned_target *dzt,
+				 unsigned int nr_blocks)
+{
+	struct page_list *pl;
+	unsigned int nr_pages;
+	int i;
+
+	/* Get a page list */
+	nr_pages = ((nr_blocks << DM_ZONED_BLOCK_SHIFT) +
+		    PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pl = kzalloc(sizeof(struct page_list) * nr_pages, GFP_KERNEL);
+	if (!pl)
+		return NULL;
+
+	/* Get pages */
+	for (i = 0; i < nr_pages; i++) {
+		pl[i].page = alloc_page(GFP_KERNEL);
+		if (!pl[i].page) {
+			dm_zoned_reclaim_free_page_list(dzt, pl, i);
+			return NULL;
+		}
+		if (i > 0)
+			pl[i - 1].next = &pl[i];
+	}
+
+	return pl;
+}
+
+/**
+ * Read blocks.
+ */
+static int
+dm_zoned_reclaim_read(struct dm_zoned_target *dzt,
+		      struct dm_zoned_zone *zone,
+		      sector_t chunk_block,
+		      unsigned int nr_blocks,
+		      struct page_list *pl)
+{
+	struct dm_io_request ioreq;
+	struct dm_io_region ioreg;
+	int ret;
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Read %s zone %lu, "
+			   "block %zu, %u blocks\n",
+			   dm_zoned_zone_is_cmr(zone) ? "CMR" : "SMR",
+			   zone->id, chunk_block, nr_blocks);
+
+	/* Setup I/O request and region */
+	ioreq.bi_rw = READ;
+	ioreq.mem.type = DM_IO_PAGE_LIST;
+	ioreq.mem.offset = 0;
+	ioreq.mem.ptr.pl = pl;
+	ioreq.notify.fn = NULL;
+	ioreq.notify.context = NULL;
+	ioreq.client = dzt->reclaim_client;
+	ioreg.bdev = dzt->zbd;
+	ioreg.sector = dm_zoned_block_to_sector(dm_zoned_zone_start_block(zone)
+						+ chunk_block);
+	ioreg.count = dm_zoned_block_to_sector(nr_blocks);
+
+	/* Do read */
+	ret = dm_io(&ioreq, 1, &ioreg, NULL);
+	if (ret) {
+		dm_zoned_dev_error(dzt, "Reclaim: Read %s zone %lu, "
+				   "block %zu, %u blocks failed %d\n",
+				   dm_zoned_zone_is_cmr(zone) ? "CMR" : "SMR",
+				   zone->id, chunk_block, nr_blocks, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * Write blocks.
+ */
+static int
+dm_zoned_reclaim_write(struct dm_zoned_target *dzt,
+		       struct dm_zoned_zone *zone,
+		       sector_t chunk_block,
+		       unsigned int nr_blocks,
+		       struct page_list *pl)
+{
+	struct dm_io_request ioreq;
+	struct dm_io_region ioreg;
+	int ret;
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Write %s zone %lu, block %zu, %u blocks\n",
+			 dm_zoned_zone_is_cmr(zone) ? "CMR" : "SMR",
+			 zone->id,
+			 chunk_block,
+			 nr_blocks);
+
+	/* Fill holes between writes */
+	if (dm_zoned_zone_is_smr(zone) && chunk_block > zone->wp_block) {
+		ret = dm_zoned_advance_zone_wp(dzt, zone, chunk_block - zone->wp_block);
+		if (ret)
+			return ret;
+	}
+
+	/* Setup I/O request and region */
+	ioreq.bi_rw = REQ_WRITE;
+	ioreq.mem.type = DM_IO_PAGE_LIST;
+	ioreq.mem.offset = 0;
+	ioreq.mem.ptr.pl = pl;
+	ioreq.notify.fn = NULL;
+	ioreq.notify.context = NULL;
+	ioreq.client = dzt->reclaim_client;
+	ioreg.bdev = dzt->zbd;
+	ioreg.sector = dm_zoned_block_to_sector(dm_zoned_zone_start_block(zone) + chunk_block);
+	ioreg.count = dm_zoned_block_to_sector(nr_blocks);
+
+	/* Do write */
+	ret = dm_io(&ioreq, 1, &ioreg, NULL);
+	if (ret) {
+		dm_zoned_dev_error(dzt, "Reclaim: Write %s zone %lu, block %zu, %u blocks failed %d\n",
+				 dm_zoned_zone_is_cmr(zone) ? "CMR" : "SMR",
+				 zone->id,
+				 chunk_block,
+				 nr_blocks,
+				 ret);
+		return ret;
+	}
+
+	if (dm_zoned_zone_is_smr(zone))
+		zone->wp_block += nr_blocks;
+
+	return 0;
+}
+
+/**
+ * Copy blocks between zones.
+ */
+static int
+dm_zoned_reclaim_copy(struct dm_zoned_target *dzt,
+		      struct dm_zoned_zone *from_zone,
+		      struct dm_zoned_zone *to_zone,
+		      sector_t chunk_block,
+		      unsigned int nr_blocks)
+{
+	struct page_list *pl;
+	sector_t block = chunk_block;
+	unsigned int blocks = nr_blocks;
+	unsigned int count, max_count;
+	int ret;
+
+	/* Get a page list */
+	max_count = min_t(unsigned int, nr_blocks, DM_ZONED_RECLAIM_MAX_BLOCKS);
+	pl = dm_zoned_reclaim_alloc_page_list(dzt, max_count);
+	if (!pl) {
+		dm_zoned_dev_error(dzt, "Reclaim: Allocate %u pages failed\n",
+				 max_count);
+		return -ENOMEM;
+	}
+
+	while (blocks) {
+
+		/* Read blocks */
+		count = min_t(unsigned int, blocks, max_count);
+		ret = dm_zoned_reclaim_read(dzt, from_zone, block, count, pl);
+		if (ret)
+			goto out;
+
+		/* Write blocks */
+		ret = dm_zoned_reclaim_write(dzt, to_zone, block, count, pl);
+		if (ret)
+			goto out;
+
+		block += count;
+		blocks -= count;
+
+	}
+
+	/* Validate written blocks */
+	ret = dm_zoned_validate_blocks(dzt, to_zone, chunk_block, nr_blocks);
+
+out:
+	dm_zoned_reclaim_free_page_list(dzt, pl, max_count);
+
+	return ret;
+}
+
+/**
+ * Get a zone for reclaim.
+ */
+static inline int
+dm_zoned_reclaim_lock(struct dm_zoned_target *dzt,
+		      struct dm_zoned_zone *dzone)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	/* Skip active zones */
+	dm_zoned_lock_zone(dzone, flags);
+	if (!test_bit(DM_ZONE_ACTIVE, &dzone->flags)
+	    && !test_and_set_bit(DM_ZONE_RECLAIM, &dzone->flags))
+		ret = 1;
+	dm_zoned_unlock_zone(dzone, flags);
+
+	return ret;
+}
+
+/**
+ * Clear a zone reclaim flag.
+ */
+static inline void
+dm_zoned_reclaim_unlock(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *dzone)
+{
+	unsigned long flags;
+
+	dm_zoned_lock_zone(dzone, flags);
+	clear_bit_unlock(DM_ZONE_RECLAIM, &dzone->flags);
+	smp_mb__after_atomic();
+	wake_up_bit(&dzone->flags, DM_ZONE_RECLAIM);
+	dm_zoned_unlock_zone(dzone, flags);
+}
+
+/**
+ * Write valid blocks of @dzone into its buffer zone
+ * and swap the buffer zone with with @wzone.
+ */
+static void
+dm_zoned_reclaim_remap_buffer(struct dm_zoned_target *dzt,
+			      struct dm_zoned_zone *dzone,
+			      struct dm_zoned_zone *wzone)
+{
+	struct dm_zoned_zone *bzone = dzone->bzone;
+	struct dm_zoned_zone *rzone;
+	unsigned int nr_blocks;
+	sector_t chunk_block = 0;
+	int ret = 0;
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Remap bzone %lu as dzone "
+			   "(new bzone %lu, %s dzone %lu)\n",
+			   bzone->id,
+			   wzone->id,
+			   dm_zoned_zone_is_cmr(dzone) ? "CMR" : "SMR",
+			   dzone->id);
+
+	while (chunk_block < dzt->zone_nr_blocks) {
+
+		/* Test block validity in the data zone */
+		rzone = dzone;
+		if (chunk_block < dzone->wp_block) {
+			ret = dm_zoned_block_valid(dzt, dzone, chunk_block);
+			if (ret < 0)
+				break;
+		}
+		if (!ret) {
+			chunk_block++;
+			continue;
+		}
+
+		/* Copy and validate blocks */
+		nr_blocks = ret;
+		ret = dm_zoned_reclaim_copy(dzt, dzone, bzone, chunk_block, nr_blocks);
+		if (ret)
+			break;
+
+		chunk_block += nr_blocks;
+
+	}
+
+	if (ret) {
+		/* Free the target data zone */
+		dm_zoned_invalidate_zone(dzt, wzone);
+		dm_zoned_free_dzone(dzt, wzone);
+		goto out;
+	}
+
+	/* Remap bzone to dzone chunk and set wzone as a buffer zone */
+	dm_zoned_reclaim_lock(dzt, bzone);
+	dm_zoned_remap_bzone(dzt, bzone, wzone);
+
+	/* Invalidate all blocks in the data zone and free it */
+	dm_zoned_invalidate_zone(dzt, dzone);
+	dm_zoned_free_dzone(dzt, dzone);
+
+out:
+	dm_zoned_reclaim_unlock(dzt, bzone);
+	dm_zoned_reclaim_unlock(dzt, wzone);
+}
+
+/**
+ * Merge valid blocks of @dzone and of its buffer zone into @wzone.
+ */
+static void
+dm_zoned_reclaim_merge_buffer(struct dm_zoned_target *dzt,
+			      struct dm_zoned_zone *dzone,
+			      struct dm_zoned_zone *wzone)
+{
+	struct dm_zoned_zone *bzone = dzone->bzone;
+	struct dm_zoned_zone *rzone;
+	unsigned int nr_blocks;
+	sector_t chunk_block = 0;
+	int ret = 0;
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Merge zones %lu and %lu into %s dzone %lu\n",
+			   bzone->id,
+			   dzone->id,
+			   dm_zoned_zone_is_cmr(wzone) ? "CMR" : "SMR",
+			   wzone->id);
+
+	while (chunk_block < dzt->zone_nr_blocks) {
+
+		/* Test block validity in the data zone */
+		rzone = dzone;
+		if (chunk_block < dzone->wp_block) {
+			ret = dm_zoned_block_valid(dzt, dzone, chunk_block);
+			if (ret < 0)
+				break;
+		}
+		if (!ret) {
+			/* Check the buffer zone */
+			rzone = bzone;
+			ret = dm_zoned_block_valid(dzt, bzone, chunk_block);
+			if (ret < 0)
+				break;
+			if (!ret) {
+				chunk_block++;
+				continue;
+			}
+		}
+
+		/* Copy and validate blocks */
+		nr_blocks = ret;
+		ret = dm_zoned_reclaim_copy(dzt, rzone, wzone, chunk_block, nr_blocks);
+		if (ret)
+			break;
+
+		chunk_block += nr_blocks;
+
+	}
+
+	if (ret) {
+		/* Free the target data zone */
+		dm_zoned_invalidate_zone(dzt, wzone);
+		dm_zoned_free_dzone(dzt, wzone);
+		goto out;
+	}
+
+	/* Invalidate all blocks of the buffer zone and free it */
+	dm_zoned_invalidate_zone(dzt, bzone);
+	dm_zoned_free_bzone(dzt, bzone);
+
+	/* Finally, remap dzone to wzone */
+	dm_zoned_remap_dzone(dzt, dzone, wzone);
+	dm_zoned_invalidate_zone(dzt, dzone);
+	dm_zoned_free_dzone(dzt, dzone);
+
+out:
+	dm_zoned_reclaim_unlock(dzt, wzone);
+}
+
+/**
+ * Move valid blocks of the buffer zone into the data zone.
+ */
+static void
+dm_zoned_reclaim_flush_buffer(struct dm_zoned_target *dzt,
+			      struct dm_zoned_zone *dzone)
+{
+	struct dm_zoned_zone *bzone = dzone->bzone;
+	unsigned int nr_blocks;
+	sector_t chunk_block = 0;
+	int ret = 0;
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Flush buffer zone %lu into %s dzone %lu\n",
+			   bzone->id,
+			   dm_zoned_zone_is_cmr(dzone) ? "CMR" : "SMR",
+			   dzone->id);
+
+	/* The data zone may be empty due to discard after writes. */
+	/* So reset it before writing the buffer zone blocks.      */
+	dm_zoned_reset_zone_wp(dzt, dzone);
+
+	while (chunk_block < dzt->zone_nr_blocks) {
+
+		/* Test block validity */
+		ret = dm_zoned_block_valid(dzt, bzone, chunk_block);
+		if (ret < 0)
+			break;
+		if (!ret) {
+			chunk_block++;
+			continue;
+		}
+
+		/* Copy and validate blocks */
+		nr_blocks = ret;
+		ret = dm_zoned_reclaim_copy(dzt, bzone, dzone, chunk_block, nr_blocks);
+		if (ret)
+			break;
+
+		chunk_block += nr_blocks;
+
+	}
+
+	if (ret) {
+		/* Cleanup the data zone */
+		dm_zoned_invalidate_zone(dzt, dzone);
+		dm_zoned_reset_zone_wp(dzt, dzone);
+		return;
+	}
+
+	/* Invalidate all blocks of the buffer zone and free it */
+	dm_zoned_invalidate_zone(dzt, bzone);
+	dm_zoned_free_bzone(dzt, bzone);
+}
+
+/**
+ * Free empty data zone and buffer zone.
+ */
+static void
+dm_zoned_reclaim_empty(struct dm_zoned_target *dzt,
+		       struct dm_zoned_zone *dzone)
+{
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Chunk %zu, free empty dzone %lu\n",
+			   dzone->map,
+			   dzone->id);
+
+	if (dzone->bzone)
+		dm_zoned_free_bzone(dzt, dzone->bzone);
+	dm_zoned_free_dzone(dzt, dzone);
+}
+
+/**
+ * Choose a reclaim zone target for merging/flushing a buffer zone.
+ */
+static struct dm_zoned_zone *
+dm_zoned_reclaim_target(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *dzone)
+{
+	struct dm_zoned_zone *wzone;
+	unsigned int blocks = dzone->wr_dir_blocks + dzone->wr_buf_blocks;
+	int type = DM_DZONE_ANY;
+
+	dm_zoned_dev_debug(dzt, "Reclaim: Zone %lu, %lu%% buffered blocks\n",
+			   dzone->id,
+			   (blocks ? dzone->wr_buf_blocks * 100 / blocks : 0));
+
+	/* Over 75 % of random write blocks -> cmr */
+	if (!dzone->wr_dir_blocks
+	    || (blocks &&
+		(dzone->wr_buf_blocks * 100 / blocks) >= 75))
+		type = DM_DZONE_CMR;
+
+	/* Get a data zone for merging */
+	dm_zoned_map_lock(dzt);
+	wzone = dm_zoned_alloc_dzone(dzt, DM_ZONED_MAP_UNMAPPED, type);
+	if (wzone) {
+		/*
+		 * When the merge zone will be remapped, it may
+		 * be accessed right away. Mark it as reclaim
+		 * in order to properly cleanup the source data
+		 * zone before any access.
+		 */
+		dm_zoned_reclaim_lock(dzt, wzone);
+	}
+	dm_zoned_map_unlock(dzt);
+
+	dm_zoned_zone_reset_stats(dzone);
+
+	return wzone;
+}
+
+/**
+ * Reclaim the buffer zone of @dzone.
+ */
+static void
+dm_zoned_reclaim_bzone(struct dm_zoned_target *dzt,
+		       struct dm_zoned_zone *bzone)
+{
+	struct dm_zoned_zone *dzone = bzone->bzone;
+	struct dm_zoned_zone *wzone;
+	int bweight, dweight;
+
+	/* Paranoia checks */
+	dm_zoned_dev_assert(dzt, dzone != NULL);
+	dm_zoned_dev_assert(dzt, dzone->bzone == bzone);
+	dm_zoned_dev_assert(dzt, !test_bit(DM_ZONE_ACTIVE, &dzone->flags));
+
+	dweight = dm_zoned_zone_weight(dzt, dzone);
+	bweight = dm_zoned_zone_weight(dzt, bzone);
+	dm_zoned_dev_debug(dzt, "Reclaim: Chunk %zu, dzone %lu (weight %d), "
+			   "bzone %lu (weight %d)\n",
+			   dzone->map, dzone->id, dweight,
+			   bzone->id, bweight);
+
+	/* If everything is invalid, free the zones */
+	if (!dweight && !bweight) {
+		dm_zoned_reclaim_empty(dzt, dzone);
+		goto out;
+	}
+
+	/* If all valid blocks are in the buffer zone, */
+	/* move them directly into the data zone.      */
+	if (!dweight) {
+		dm_zoned_reclaim_flush_buffer(dzt, dzone);
+		goto out;
+	}
+
+	/* Buffer zone and data zone need to be merged in a a new data zone */
+	wzone = dm_zoned_reclaim_target(dzt, dzone);
+	if (!wzone) {
+		dm_zoned_dev_error(dzt, "Reclaim: No target zone available "
+				   "for merge reclaim\n");
+		goto out;
+	}
+
+	/* If the target zone is CMR, write valid blocks of the data zone  */
+	/* into the buffer zone and swap the buffer zone and new data zone */
+	/* But do this only if it is less costly (less blocks to move)     */
+	/* than a regular merge.                                           */
+	if (dm_zoned_zone_is_cmr(wzone) && bweight > dweight) {
+		dm_zoned_reclaim_remap_buffer(dzt, dzone, wzone);
+		goto out;
+	}
+
+	/* Otherwise, merge the valid blocks of the buffer zone and data   */
+	/* zone into an newly allocated SMR data zone. On success, the new */
+	/* data zone is remapped to the chunk of the original data zone    */
+	dm_zoned_reclaim_merge_buffer(dzt, dzone, wzone);
+
+out:
+	dm_zoned_reclaim_unlock(dzt, dzone);
+}
+
+/**
+ * Reclaim buffer zone work.
+ */
+static void
+dm_zoned_reclaim_bzone_work(struct work_struct *work)
+{
+	struct dm_zoned_reclaim_zwork *rzwork = container_of(work,
+					struct dm_zoned_reclaim_zwork, work);
+	struct dm_zoned_target *dzt = rzwork->target;
+
+	dm_zoned_reclaim_bzone(dzt, rzwork->bzone);
+
+	kfree(rzwork);
+}
+
+/**
+ * Select a buffer zone candidate for reclaim.
+ */
+static struct dm_zoned_zone *
+dm_zoned_reclaim_bzone_candidate(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *bzone;
+
+	/* Search for a buffer zone candidate to reclaim */
+	dm_zoned_map_lock(dzt);
+
+	if (list_empty(&dzt->bz_lru_list))
+		goto out;
+
+	bzone = list_first_entry(&dzt->bz_lru_list, struct dm_zoned_zone, link);
+	while (bzone) {
+		if (dm_zoned_reclaim_lock(dzt, bzone->bzone)) {
+			dm_zoned_map_unlock(dzt);
+			return bzone;
+		}
+		if (list_is_last(&bzone->link, &dzt->bz_lru_list))
+			break;
+		bzone = list_next_entry(bzone, link);
+	}
+
+out:
+	dm_zoned_map_unlock(dzt);
+
+	return NULL;
+
+}
+
+/**
+ * Start reclaim workers.
+ */
+static int
+dm_zoned_reclaim_bzones(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *bzone = NULL;
+	struct dm_zoned_reclaim_zwork *rzwork;
+	unsigned int max_workers = 0, nr_free;
+	unsigned long start;
+	int n = 0;
+
+	/* Try reclaim if there are used buffer zones AND the disk */
+	/* is idle, or, the number of free buffer zones is low.    */
+	nr_free = atomic_read(&dzt->bz_nr_free);
+	if (nr_free < dzt->bz_nr_free_low)
+		max_workers = dzt->bz_nr_free_low - nr_free;
+	else if (atomic_read(&dzt->dz_nr_active_wait))
+		max_workers = atomic_read(&dzt->dz_nr_active_wait);
+	else if (dm_zoned_idle(dzt))
+		max_workers = 1;
+	max_workers = min(max_workers, (unsigned int)DM_ZONED_RECLAIM_MAX_WORKERS);
+
+	start = jiffies;
+	while (n < max_workers) {
+
+		bzone = dm_zoned_reclaim_bzone_candidate(dzt);
+		if (!bzone)
+			break;
+
+		if (max_workers == 1) {
+			/* Do it in this context */
+			dm_zoned_reclaim_bzone(dzt, bzone);
+		} else {
+			/* Start a zone reclaim work */
+			rzwork = kmalloc(sizeof(struct dm_zoned_reclaim_zwork), GFP_KERNEL);
+			if (unlikely(!rzwork))
+				break;
+			INIT_WORK(&rzwork->work, dm_zoned_reclaim_bzone_work);
+			rzwork->target = dzt;
+			rzwork->bzone = bzone;
+			queue_work(dzt->reclaim_zwq, &rzwork->work);
+		}
+
+		n++;
+
+	}
+
+	if (n) {
+		flush_workqueue(dzt->reclaim_zwq);
+		dm_zoned_flush(dzt);
+		dm_zoned_dev_debug(dzt, "Reclaim: %d bzones reclaimed in %u msecs\n",
+				   n,
+				   jiffies_to_msecs(jiffies - start));
+	}
+
+	return n;
+}
+
+/**
+ * Reclaim unbuffered data zones marked as empty.
+ */
+static int
+dm_zoned_reclaim_dzones(struct dm_zoned_target *dzt)
+{
+	struct dm_zoned_zone *dz, *dzone;
+	int ret;
+
+	dm_zoned_map_lock(dzt);
+
+	/* If not idle, do only CMR zones */
+	while (!list_empty(&dzt->dz_empty_list)) {
+
+		/* Search for a candidate to reclaim */
+		dzone = NULL;
+		list_for_each_entry(dz, &dzt->dz_empty_list, elink) {
+			if (!dm_zoned_idle(dzt) && !dm_zoned_zone_is_cmr(dz))
+				continue;
+			dzone = dz;
+			break;
+		}
+
+		if (!dzone || !dm_zoned_reclaim_lock(dzt, dzone))
+			break;
+
+		clear_bit_unlock(DM_ZONE_EMPTY, &dzone->flags);
+		smp_mb__after_atomic();
+		list_del_init(&dzone->elink);
+
+		dm_zoned_map_unlock(dzt);
+
+		if (dm_zoned_zone_weight(dzt, dzone) == 0)
+			dm_zoned_reclaim_empty(dzt, dzone);
+		dm_zoned_reclaim_unlock(dzt, dzone);
+
+		dm_zoned_map_lock(dzt);
+
+	}
+
+	ret = !list_empty(&dzt->dz_empty_list);
+
+	dm_zoned_map_unlock(dzt);
+
+	return ret;
+}
+
+/**
+ * Buffer zone reclaim work.
+ */
+void
+dm_zoned_reclaim_work(struct work_struct *work)
+{
+	struct dm_zoned_target *dzt = container_of(work,
+		struct dm_zoned_target, reclaim_work.work);
+	int have_empty_dzones;
+	int reclaimed_bzones;
+	unsigned long delay;
+
+	/* Try to reclaim buffer zones */
+	set_bit(DM_ZONED_RECLAIM_ACTIVE, &dzt->flags);
+	smp_mb__after_atomic();
+
+	dm_zoned_dev_debug(dzt, "Reclaim: %u/%u free bzones, disk %s, %d active zones (%d waiting)\n",
+			   atomic_read(&dzt->bz_nr_free),
+			   dzt->nr_buf_zones,
+			   (dm_zoned_idle(dzt) ? "idle" : "busy"),
+			   atomic_read(&dzt->dz_nr_active),
+			   atomic_read(&dzt->dz_nr_active_wait));
+
+	/* Reclaim empty data zones */
+	have_empty_dzones = dm_zoned_reclaim_dzones(dzt);
+
+	/* Reclaim buffer zones */
+	reclaimed_bzones = dm_zoned_reclaim_bzones(dzt);
+
+	if (atomic_read(&dzt->bz_nr_free) < dzt->nr_buf_zones ||
+	    have_empty_dzones) {
+		if (dm_zoned_idle(dzt)) {
+			delay = 0;
+		} else if (atomic_read(&dzt->dz_nr_active_wait) ||
+			 (atomic_read(&dzt->bz_nr_free) < dzt->bz_nr_free_low)) {
+			if (reclaimed_bzones)
+				delay = 0;
+			else
+				delay = HZ / 2;
+		} else
+			delay = DM_ZONED_RECLAIM_PERIOD;
+		dm_zoned_schedule_reclaim(dzt, delay);
+	}
+
+	clear_bit_unlock(DM_ZONED_RECLAIM_ACTIVE, &dzt->flags);
+	smp_mb__after_atomic();
+}
+
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
new file mode 100644
index 0000000..ea0ee92
--- /dev/null
+++ b/drivers/md/dm-zoned.h
@@ -0,0 +1,687 @@
+/*
+ * (C) Copyright 2016 Western Digital.
+ *
+ * This software is distributed under the terms of the GNU Lesser General
+ * Public License version 2, or any later version, "as is," without technical
+ * support, and WITHOUT ANY WARRANTY, without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * Author: Damien Le Moal <damien.lemoal@xxxxxxxx>
+ */
+#include <linux/types.h>
+#include <linux/blkdev.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-io.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/rwsem.h>
+#include <linux/mutex.h>
+#include <linux/workqueue.h>
+#include <linux/buffer_head.h>
+
+/**
+ * Enable to get debug messages support.
+ */
+#undef __DM_ZONED_DEBUG
+
+/**
+ * Version.
+ */
+#define DM_ZONED_VER_MAJ			0
+#define DM_ZONED_VER_MIN			1
+
+/**
+ * Zone type (high 4 bits of zone flags).
+ */
+#define DM_ZONE_META		0x10000000
+#define DM_ZONE_BUF		0x20000000
+#define DM_ZONE_DATA		0x30000000
+#define DM_ZONE_TYPE_MASK	0xF0000000
+
+/**
+ * Zone flags.
+ */
+enum {
+	DM_ZONE_ACTIVE,
+	DM_ZONE_ACTIVE_BIO,
+	DM_ZONE_ACTIVE_WAIT,
+	DM_ZONE_BUFFERED,
+	DM_ZONE_DIRTY,
+	DM_ZONE_EMPTY,
+	DM_ZONE_RECLAIM,
+};
+
+/**
+ * dm device emulates 4K blocks.
+ */
+#define DM_ZONED_BLOCK_SHIFT		12
+#define DM_ZONED_BLOCK_SIZE		(1 << DM_ZONED_BLOCK_SHIFT)
+#define DM_ZONED_BLOCK_MASK		(DM_ZONED_BLOCK_SIZE - 1)
+
+#define DM_ZONED_BLOCK_SHIFT_BITS	(DM_ZONED_BLOCK_SHIFT + 3)
+#define DM_ZONED_BLOCK_SIZE_BITS	(DM_ZONED_BLOCK_SIZE << 3)
+#define DM_ZONED_BLOCK_MASK_BITS	(DM_ZONED_BLOCK_SIZE_BITS - 1)
+
+#define DM_ZONED_BLOCK_SECTORS		(DM_ZONED_BLOCK_SIZE >> SECTOR_SHIFT)
+#define DM_ZONED_BLOCK_SECTORS_MASK	(DM_ZONED_BLOCK_SECTORS - 1)
+
+#define dm_zoned_block_to_sector(b) \
+	((b) << (DM_ZONED_BLOCK_SHIFT - SECTOR_SHIFT))
+#define dm_zoned_sector_to_block(s) \
+	((s) >> (DM_ZONED_BLOCK_SHIFT - SECTOR_SHIFT))
+
+#define DM_ZONED_MIN_BIOS		128
+
+/**
+ * On-disk super block (sector 0 of the target device).
+ */
+#define DM_ZONED_MAGIC	((((unsigned int)('D')) << 24) | \
+			 (((unsigned int)('S')) << 16) | \
+			 (((unsigned int)('M')) <<  8) | \
+			 ((unsigned int)('R')))
+#define DM_ZONED_META_VER		1
+
+/**
+ * On disk metadata:
+ *    - Block 0 stores the super block.
+ *    - From block 1, nr_map_blocks blocks of data zone mapping entries
+ *    - From block nr_map_blocks+1, nr_bitmap_blocks blocks of zone
+ *      block bitmap.
+ */
+
+/**
+ * Buffer zones mapping entry: each entry allows to first identify
+ * the zones of the backen device that are used as buffer zones
+ * (using bzone_id), and second, the number of the zone being buffered.
+ * For unused buffer zones, the data zone ID is set to 0.
+ */
+struct dm_zoned_bz_map {
+	__le32			bzone_id;			/*    4 */
+	__le32			dzone_id;			/*    8 */
+};
+
+#define DM_ZONED_NR_BZONES		32
+#define DM_ZONED_NR_BZONES_MIN		4
+#define DM_ZONED_NR_BZONES_LOW		25
+#define DM_ZONED_NR_BZONES_LOW_MIN	2
+
+/**
+ * Buffer zones mapping entries are stored in the super block.
+ * At most DM_ZONED_NR_BZONES_MAX fit (= 496).
+ */
+#define DM_ZONED_NR_BZONES_MAX	((4096 - 104) / sizeof(struct dm_zoned_bz_map))
+
+struct dm_zoned_super {
+
+	__le32			magic;				/*    4 */
+	__le32			version;			/*    8 */
+
+	__le32			nr_buf_zones;			/*   12 */
+	__le32			nr_data_zones;			/*   16 */
+	__le32			nr_map_blocks;			/*   20 */
+	__le32			nr_bitmap_blocks;		/*   24 */
+
+	u8			reserved[104];			/*  128 */
+
+	struct dm_zoned_bz_map	bz_map[DM_ZONED_NR_BZONES_MAX];	/* 4096 */
+
+};
+
+/**
+ * Zone mapping table metadata.
+ */
+#define DM_ZONED_MAP_ENTRIES_PER_BLOCK	(DM_ZONED_BLOCK_SIZE / sizeof(u32))
+#define DM_ZONED_MAP_ENTRIES_SHIFT	(ilog2(DM_ZONED_MAP_ENTRIES_PER_BLOCK))
+#define DM_ZONED_MAP_ENTRIES_MASK	(DM_ZONED_MAP_ENTRIES_PER_BLOCK - 1)
+#define DM_ZONED_MAP_UNMAPPED		UINT_MAX
+
+#define DM_ZONE_WORK_MAX	128
+#define DM_ZONE_WORK_MAX_BIO	64
+
+struct dm_zoned_target;
+struct dm_zoned_zone;
+
+/**
+ * Zone work descriptor: this exists only
+ * for active zones.
+ */
+struct dm_zoned_zwork {
+	struct work_struct	work;
+
+	struct dm_zoned_target	*target;
+	struct dm_zoned_zone	*dzone;
+
+	struct list_head	link;
+
+	/* ref counts the number of BIOs pending   */
+	/* and executing, as well as, the queueing */
+	/* status of the work_struct.              */
+	atomic_t		ref;
+	atomic_t		bio_count;
+	struct bio_list		bio_list;
+
+};
+
+/**
+ * Zone descriptor.
+ */
+struct dm_zoned_zone {
+	struct list_head	link;
+	struct list_head	elink;
+	struct blk_zone		*blkz;
+	struct dm_zoned_zwork	*zwork;
+	unsigned long		flags;
+	unsigned long		id;
+
+	/* For data zones, pointer to a write buffer zone (may be NULL)    */
+	/* For write buffer zones, pointer to the data zone being buffered */
+	struct dm_zoned_zone	*bzone;
+
+	/* For data zones: the logical chunk mapped, which */
+	/* is also the index of the entry for the zone in  */
+	/* the data zone mapping table.                    */
+	/* For buffer zones: the index of the entry for    */
+	/* zone in the buffer zone mapping table stored in */
+	/* the super block.                                */
+	sector_t		map;
+
+	/* The position of the zone write pointer,  */
+	/* relative to the first block of the zone. */
+	sector_t		wp_block;
+
+	/* Stats (to determine access pattern for reclaim) */
+	unsigned long		mtime;
+	unsigned long		wr_dir_blocks;
+	unsigned long		wr_buf_blocks;
+
+};
+
+extern struct kmem_cache *dm_zoned_zone_cache;
+
+#define dm_zoned_lock_zone(zone, flags) \
+	spin_lock_irqsave(&(zone)->blkz->lock, flags)
+#define dm_zoned_unlock_zone(zone, flags) \
+	spin_unlock_irqrestore(&(zone)->blkz->lock, flags)
+#define dm_zoned_zone_is_cmr(z) \
+	blk_zone_is_cmr((z)->blkz)
+#define dm_zoned_zone_is_smr(z) \
+	blk_zone_is_smr((z)->blkz)
+#define dm_zoned_zone_is_seqreq(z) \
+	((z)->blkz->type == BLK_ZONE_TYPE_SEQWRITE_REQ)
+#define dm_zoned_zone_is_seqpref(z) \
+	((z)->blkz->type == BLK_ZONE_TYPE_SEQWRITE_PREF)
+#define dm_zoned_zone_is_seq(z) \
+	(dm_zoned_zone_is_seqreq(z) || dm_zoned_zone_is_seqpref(z))
+#define dm_zoned_zone_is_rnd(z) \
+	(dm_zoned_zone_is_cmr(z) || dm_zoned_zone_is_seqpref(z))
+
+#define dm_zoned_zone_offline(z) \
+	((z)->blkz->state == BLK_ZONE_OFFLINE)
+#define dm_zoned_zone_readonly(z) \
+	((z)->blkz->state == BLK_ZONE_READONLY)
+
+#define dm_zoned_zone_start_sector(z) \
+	((z)->blkz->start)
+#define dm_zoned_zone_sectors(z) \
+	((z)->blkz->len)
+#define dm_zoned_zone_next_sector(z) \
+	(dm_zoned_zone_start_sector(z) + dm_zoned_zone_sectors(z))
+#define dm_zoned_zone_start_block(z) \
+	dm_zoned_sector_to_block(dm_zoned_zone_start_sector(z))
+#define dm_zoned_zone_next_block(z) \
+	dm_zoned_sector_to_block(dm_zoned_zone_next_sector(z))
+#define dm_zoned_zone_empty(z) \
+	((z)->wp_block == dm_zoned_zone_start_block(z))
+
+#define dm_zoned_chunk_sector(dzt, s) \
+	((s) & (dzt)->zone_nr_sectors_mask)
+#define dm_zoned_chunk_block(dzt, b) \
+	((b) & (dzt)->zone_nr_blocks_mask)
+
+#define dm_zoned_zone_type(z) \
+	((z)->flags & DM_ZONE_TYPE_MASK)
+#define dm_zoned_zone_meta(z) \
+	(dm_zoned_zone_type(z) == DM_ZONE_META)
+#define dm_zoned_zone_buf(z) \
+	(dm_zoned_zone_type(z) == DM_ZONE_BUF)
+#define dm_zoned_zone_data(z) \
+	(dm_zoned_zone_type(z) == DM_ZONE_DATA)
+
+#define dm_zoned_bio_sector(bio) \
+	((bio)->bi_iter.bi_sector)
+#define dm_zoned_bio_chunk_sector(dzt, bio) \
+	dm_zoned_chunk_sector((dzt), dm_zoned_bio_sector(bio))
+#define dm_zoned_bio_sectors(bio) \
+	bio_sectors(bio)
+#define dm_zoned_bio_block(bio) \
+	dm_zoned_sector_to_block(dm_zoned_bio_sector(bio))
+#define dm_zoned_bio_blocks(bio) \
+	dm_zoned_sector_to_block(dm_zoned_bio_sectors(bio))
+#define dm_zoned_bio_chunk(dzt, bio) \
+	(dm_zoned_bio_sector(bio) >> (dzt)->zone_nr_sectors_shift)
+#define dm_zoned_bio_chunk_block(dzt, bio) \
+	dm_zoned_chunk_block((dzt), dm_zoned_bio_block(bio))
+
+/**
+ * Reset a zone stats.
+ */
+static inline void
+dm_zoned_zone_reset_stats(struct dm_zoned_zone *zone)
+{
+	zone->mtime = 0;
+	zone->wr_dir_blocks = 0;
+	zone->wr_buf_blocks = 0;
+}
+
+/**
+ * For buffer zone reclaim.
+ */
+#define DM_ZONED_RECLAIM_PERIOD_SECS	1UL /* Reclaim check period (seconds) */
+#define DM_ZONED_RECLAIM_PERIOD		(DM_ZONED_RECLAIM_PERIOD_SECS * HZ)
+#define DM_ZONED_RECLAIM_MAX_BLOCKS	1024 /* Max 4 KB blocks per reclaim I/O */
+#define DM_ZONED_RECLAIM_MAX_WORKERS	4 /* Maximum number of buffer zone reclaim works */
+
+struct dm_zoned_reclaim_zwork {
+	struct work_struct	work;
+	struct dm_zoned_target	*target;
+	struct dm_zoned_zone	*bzone;
+};
+
+/**
+ * Default maximum number of blocks for
+ * an SMR zone WP alignment with WRITE SAME.
+ * (0 => disable align wp)
+ */
+#define DM_ZONED_ALIGN_WP_MAX_BLOCK	0
+
+/**
+ * Target flags.
+ */
+enum {
+	DM_ZONED_DEBUG,
+	DM_ZONED_ALIGN_WP,
+	DM_ZONED_SUSPENDED,
+	DM_ZONED_RECLAIM_ACTIVE,
+};
+
+/**
+ * Target descriptor.
+ */
+struct dm_zoned_target {
+	struct dm_dev		*ddev;
+
+	/* Target zoned device information */
+	char			zbd_name[BDEVNAME_SIZE];
+	struct block_device	*zbd;
+	sector_t		zbd_capacity;
+	struct request_queue	*zbdq;
+	unsigned int		zbd_metablk_shift;
+	unsigned long		flags;
+	struct buffer_head	*sb_bh;
+
+	unsigned int		nr_zones;
+	unsigned int		nr_cmr_zones;
+	unsigned int		nr_smr_zones;
+	unsigned int		nr_rnd_zones;
+	unsigned int		nr_meta_zones;
+	unsigned int		nr_buf_zones;
+	unsigned int		nr_data_zones;
+	unsigned int		nr_cmr_data_zones;
+	unsigned int		nr_smr_data_zones;
+
+#ifdef __DM_ZONED_DEBUG
+	size_t			used_mem;
+#endif
+
+	sector_t		zone_nr_sectors;
+	unsigned int		zone_nr_sectors_shift;
+	sector_t		zone_nr_sectors_mask;
+
+	sector_t		zone_nr_blocks;
+	sector_t		zone_nr_blocks_shift;
+	sector_t		zone_nr_blocks_mask;
+	sector_t		zone_bitmap_size;
+	unsigned int		zone_nr_bitmap_blocks;
+
+	/* Zone mapping management lock */
+	struct mutex		map_lock;
+
+	/* Zone bitmaps */
+	sector_t		bitmap_block;
+	unsigned int		nr_bitmap_blocks;
+
+	/* Buffer zones */
+	struct dm_zoned_bz_map	*bz_map;
+	atomic_t		bz_nr_free;
+	unsigned int		bz_nr_free_low;
+	struct list_head	bz_free_list;
+	struct list_head	bz_lru_list;
+	struct list_head	bz_wait_list;
+
+	/* Data zones */
+	unsigned int		nr_map_blocks;
+	unsigned int		align_wp_max_blocks;
+	struct buffer_head	**dz_map_bh;
+	atomic_t		dz_nr_active;
+	atomic_t		dz_nr_active_wait;
+	unsigned int		dz_nr_unmap;
+	struct list_head	dz_unmap_cmr_list;
+	struct list_head	dz_map_cmr_list;
+	struct list_head	dz_unmap_smr_list;
+	struct list_head	dz_empty_list;
+
+	/* Internal I/Os */
+	struct bio_set		*bio_set;
+	struct workqueue_struct *zone_wq;
+	unsigned long		last_bio_time;
+
+	/* For flush */
+	spinlock_t		flush_lock;
+	struct bio_list		flush_list;
+	struct work_struct	flush_work;
+	struct workqueue_struct *flush_wq;
+
+	/* For reclaim */
+	struct dm_io_client	*reclaim_client;
+	struct delayed_work	reclaim_work;
+	struct workqueue_struct *reclaim_wq;
+	struct workqueue_struct *reclaim_zwq;
+
+};
+
+#define dm_zoned_map_lock(dzt)		mutex_lock(&(dzt)->map_lock)
+#define dm_zoned_map_unlock(dzt)	mutex_unlock(&(dzt)->map_lock)
+
+/**
+ * Number of seconds without BIO to consider
+ * the device idle.
+ */
+#define DM_ZONED_IDLE_SECS		2UL
+
+/**
+ * Test if the target device is idle.
+ */
+static inline int
+dm_zoned_idle(struct dm_zoned_target *dzt)
+{
+	return atomic_read(&(dzt)->dz_nr_active) == 0 &&
+		time_is_before_jiffies(dzt->last_bio_time
+				       + DM_ZONED_IDLE_SECS * HZ);
+}
+
+/**
+ * Target config passed as dmsetup arguments.
+ */
+struct dm_zoned_target_config {
+	char			*dev_path;
+	int			debug;
+	int			format;
+	unsigned long		align_wp;
+	unsigned long		nr_buf_zones;
+};
+
+/**
+ * Zone BIO context.
+ */
+struct dm_zoned_bioctx {
+	struct dm_zoned_target	*target;
+	struct dm_zoned_zone	*dzone;
+	struct bio		*bio;
+	atomic_t		ref;
+	int			error;
+};
+
+#define dm_zoned_info(format, args...)			\
+	printk(KERN_INFO "dm-zoned: " format, ## args)
+
+#define dm_zoned_dev_info(target, format, args...)	\
+	dm_zoned_info("(%s) " format,			\
+		      (dzt)->zbd_name, ## args)
+
+#define dm_zoned_error(format, args...)			\
+	printk(KERN_ERR	"dm-zoned: " format, ## args)
+
+#define dm_zoned_dev_error(dzt, format, args...)	\
+	dm_zoned_error("(%s) " format,			\
+		       (dzt)->zbd_name, ## args)
+
+#define dm_zoned_warning(format, args...)		\
+	printk(KERN_ALERT				\
+	       "dm-zoned: " format, ## args)
+
+#define dm_zoned_dev_warning(dzt, format, args...)	\
+	dm_zoned_warning("(%s) " format,		\
+			 (dzt)->zbd_name, ## args)
+
+#define dm_zoned_dump_stack()				\
+	do {						\
+		dm_zoned_warning("Start stack dump\n");	\
+		dump_stack();				\
+		dm_zoned_warning("End stack dump\n");	\
+	} while (0)
+
+#define dm_zoned_oops(format, args...)			\
+	do {						\
+		dm_zoned_warning(format, ## args);	\
+		dm_zoned_dump_stack();			\
+		BUG();					\
+	} while (0)
+
+#define dm_zoned_dev_oops(dzt, format, args...)			\
+	do {							\
+		dm_zoned_dev_warning(dzt, format, ## args);	\
+		dm_zoned_dump_stack();				\
+		BUG();						\
+	} while (0)
+
+#define dm_zoned_assert_cond(cond)	(unlikely(!(cond)))
+#define dm_zoned_assert(cond)					\
+	do {							\
+		if (dm_zoned_assert_cond(cond)) {		\
+			dm_zoned_oops("(%s/%d) "		\
+				      "Condition %s failed\n",	\
+				      __func__, __LINE__,	\
+				      # cond);			\
+		}						\
+	} while (0)
+
+#define dm_zoned_dev_assert(dzt, cond)					\
+	do {								\
+		if (dm_zoned_assert_cond(cond)) {			\
+			dm_zoned_dev_oops(dzt, "(%s/%d) "		\
+					  "Condition %s failed\n",	\
+					  __func__, __LINE__,		\
+					  # cond);			\
+		}							\
+	} while (0)
+
+#ifdef __DM_ZONED_DEBUG
+
+#define dm_zoned_dev_debug(dzt, format, args...)		\
+	do {							\
+		if (test_bit(DM_ZONED_DEBUG, &(dzt)->flags)) {	\
+			printk(KERN_INFO			\
+			       "dm-zoned: (%s) " format,	\
+			       (dzt)->zbd_name,	## args);	\
+		}						\
+	} while (0)
+
+
+#else
+
+#define dm_zoned_dev_debug(dzt, format, args...) \
+	do { } while (0)
+
+#endif /* __DM_ZONED_DEBUG */
+
+extern int
+dm_zoned_init_meta(struct dm_zoned_target *dzt,
+		   struct dm_zoned_target_config *conf);
+
+extern int
+dm_zoned_resume_meta(struct dm_zoned_target *dzt);
+
+extern void
+dm_zoned_cleanup_meta(struct dm_zoned_target *dzt);
+
+extern int
+dm_zoned_flush(struct dm_zoned_target *dzt);
+
+extern int
+dm_zoned_advance_zone_wp(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 sector_t nr_blocks);
+
+extern int
+dm_zoned_reset_zone_wp(struct dm_zoned_target *dzt,
+		       struct dm_zoned_zone *zone);
+
+extern struct dm_zoned_zone *
+dm_zoned_alloc_bzone(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *dzone);
+
+extern void
+dm_zoned_free_bzone(struct dm_zoned_target *dzt,
+		    struct dm_zoned_zone *bzone);
+
+extern void
+dm_zoned_validate_bzone(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *dzone);
+
+/**
+ * Data zone allocation type hint.
+ */
+enum {
+	DM_DZONE_ANY,
+	DM_DZONE_SMR,
+	DM_DZONE_CMR
+};
+
+extern struct dm_zoned_zone *
+dm_zoned_alloc_dzone(struct dm_zoned_target *dzt,
+		     unsigned int chunk,
+		     unsigned int type_hint);
+
+extern void
+dm_zoned_free_dzone(struct dm_zoned_target *dzt,
+		    struct dm_zoned_zone *dzone);
+
+extern void
+dm_zoned_validate_dzone(struct dm_zoned_target *dzt,
+			struct dm_zoned_zone *dzone);
+
+extern void
+dm_zoned_remap_dzone(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *from_dzone,
+		     struct dm_zoned_zone *to_dzone);
+
+extern void
+dm_zoned_remap_bzone(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *bzone,
+		     struct dm_zoned_zone *new_bzone);
+
+extern struct dm_zoned_zone *
+dm_zoned_bio_map(struct dm_zoned_target *dzt,
+		 struct bio *bio);
+
+extern void
+dm_zoned_run_dzone(struct dm_zoned_target *dzt,
+		   struct dm_zoned_zone *dzone);
+
+extern void
+dm_zoned_put_dzone(struct dm_zoned_target *dzt,
+		   struct dm_zoned_zone *dzone);
+
+extern int
+dm_zoned_validate_blocks(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone,
+			 sector_t chunk_block,
+			 unsigned int nr_blocks);
+
+extern int
+dm_zoned_invalidate_blocks(struct dm_zoned_target *dzt,
+			   struct dm_zoned_zone *zone,
+			   sector_t chunk_block,
+			   unsigned int nr_blocks);
+
+static inline int
+dm_zoned_invalidate_zone(struct dm_zoned_target *dzt,
+			 struct dm_zoned_zone *zone)
+{
+	return dm_zoned_invalidate_blocks(dzt, zone,
+					0, dzt->zone_nr_blocks);
+}
+
+extern int
+dm_zoned_block_valid(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *zone,
+		     sector_t chunk_block);
+
+extern int
+dm_zoned_valid_blocks(struct dm_zoned_target *dzt,
+		      struct dm_zoned_zone *zone,
+		      sector_t chunk_block,
+		      unsigned int nr_blocks);
+
+static inline int
+dm_zoned_zone_weight(struct dm_zoned_target *dzt,
+		     struct dm_zoned_zone *zone)
+{
+	if (dm_zoned_zone_is_seqreq(zone)) {
+		if (dm_zoned_zone_empty(zone))
+			return 0;
+		return dm_zoned_valid_blocks(dzt, zone,
+				   0, zone->wp_block);
+	}
+
+	return dm_zoned_valid_blocks(dzt, zone,
+				   0, dzt->zone_nr_blocks);
+}
+
+/**
+ * Wait for a zone write BIOs to complete.
+ */
+static inline void
+dm_zoned_wait_for_stable_zone(struct dm_zoned_zone *zone)
+{
+	if (test_bit(DM_ZONE_ACTIVE_BIO, &zone->flags))
+		wait_on_bit_io(&zone->flags, DM_ZONE_ACTIVE_BIO,
+			       TASK_UNINTERRUPTIBLE);
+}
+
+extern void
+dm_zoned_zone_work(struct work_struct *work);
+
+extern void
+dm_zoned_reclaim_work(struct work_struct *work);
+
+/**
+ * Schedule reclaim (delay in jiffies).
+ */
+static inline void
+dm_zoned_schedule_reclaim(struct dm_zoned_target *dzt,
+			  unsigned long delay)
+{
+	mod_delayed_work(dzt->reclaim_wq, &dzt->reclaim_work, delay);
+}
+
+/**
+ * Trigger reclaim.
+ */
+static inline void
+dm_zoned_trigger_reclaim(struct dm_zoned_target *dzt)
+{
+	dm_zoned_schedule_reclaim(dzt, 0);
+}
+
+#ifdef __DM_ZONED_DEBUG
+static inline void
+dm_zoned_account_mem(struct dm_zoned_target *dzt,
+		     size_t bytes)
+{
+	dzt->used_mem += bytes;
+}
+#else
+#define dm_zoned_account_mem(dzt, bytes) do { } while (0)
+#endif
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux