Re: [RFC] Draft Linux kernel interfaces for SMR/ZBC drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Based on the comments raised on the list, here is a revised version of
the proposed ZBC kernel interface.

Changes from the last version:

1)  Aligned ZBC_FLAG values to be aligned with the ZBC specification to
	simplify implementations
2)  Aligned the free_sector_criteria values to be mostly aligned with the ZBC
	specification
3)  Clarified the behaviour of blkdev_query_zones()
4)  Added an ioctl interface to expose this functionality to userspace
5)  Removed the proposed simplified data variant

Please let me know what you think!

						- Ted


/*
 * Note: this structure is 24 bytes.  Using 256 MB zones, an 8TB drive
 * will have 32,768 zones.   That means if we tried to use a contiguous
 * array we would need to allocate 768k of contiguous, non-swappable
 * kernel memory.  (Boo, hiss.) 
 *
 * This large enough that it would be painful to hang an array off the
 * block_device structure.  So we will define a function
 * blkdev_query_zones() to selectively return information for some
 * number of zones.
 *
 * It is anticipated that the block device driver will store this
 * information in a compressed form, and that z_checkpoint_offset will
 * not be dynamically tracked.  That is, the checkpoint offset will,
 * if non-zero, indicates that drive suffered a power fail event, and
 * the file system or userspace process may need to implement recovery
 * procedures.  Once the file system or userspace process writes to an
 * SMR band, the checkpoint offset will be cleared and future queries
 * for the SMR band will return the checkpoint offset == write_ptr.
 */
struct zone_status {
       sector_t	z_start;
       __u32	z_length;
       __u32	z_write_ptr_offset;  /* offset */
       __u32	z_checkpoint_offset; /* offset */
       __u32	z_flags;	     /* full, ro, offline, reset_requested */
};

#define Z_FLAG_RESET_REQUESTED	0x0001
#define Z_FLAGS_OFFLINE		0x0002
#define Z_FLAGS_RO		0x0004
#define Z_FLAGS_FULL		0x0008

#define Z_FLAG_TYPE_MASK	0x0F00
#define Z_FLAG_TYPE_CONVENTIONAL 0x0100
#define Z_FLAG_TYPE_SEQUENTIAL	0x0200


/*
 * Query the block_device bdev for information about the zones
 * starting at start_sector that match the criteria specified by
 * free_sectors_criteria.  Zone status information for at most
 * max_zones will be placed into the memory array ret_zones (which is
 * allocated by the caller, not by the blkdev_query_zones function),
 * in ascending LBA order.  The return value will be a kernel error
 * code if negative, or the number of zones actually returned if
 * non-nonegative.
 *
 * If free_sectors_criteria is positive, then return zones that have
 * at least that many sectors available to be written.  If it is zero,
 * then match all zones.  If free_sectors_criteria is negative, then
 * return the zones that match the following criteria:
 *
 *	-1     Match all full zones
 *	-2     Match all open zones
 *		  (the zone has at least one written sector and is not full)
 *	-3     Match all free zones
 *		  (the zone has no written sectors)
 *      -4     Match all read-only zones
 *      -5     Match all offline zones
 *      -6     Match all zones where the write ptr != the checkpoint ptr
 *
 * The negative values are taken from Table 4 of 14-010r1, with the
 * exception of -6, which is not in the draft spec --- but IMHO should
 * be :-) It is anticipated, though, that the kernel will keep this
 * info in in memory and so will handle matching zones which meet
 * these criteria itself, without needing to issue a ZBC command for
 * each call to blkdev_query_zones().
 */
extern int blkdev_query_zones(struct block_device *bdev,
			      sector_t start_sector,
			      int free_sectors_criteria,
			      int max_zones,
       			      struct zone_status *ret_zones);

/*
 * Reset the write pointer for a sequential write zone.
 *
 * Returns -EINVAL if the start_sector is not the beginning of a
 * sequential write zone.
 */
extern int blkdev_reset_zone_ptr(struct block_dev *bdev,
				 sector_t start_sector);


/* ioctl interface */

ZBCQUERY
	u64 starting_lba	/* IN */
	u32 criteria		/* IN */
	u32 *num_zones		/* IN/OUT */
	struct zone_status *ptr	/* OUT */

ZBCRESETZONE
	u64 starting_lba


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux