Re: [RFC] Draft Linux kernel interfaces for SMR/ZBC drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Feb 11, 2014, at 11:43 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> Based on the comments raised on the list, here is a revised version of
> the proposed ZBC kernel interface.
> 
> Changes from the last version:
> 
> 1)  Aligned ZBC_FLAG values to be aligned with the ZBC specification to
> 	simplify implementations
> 2)  Aligned the free_sector_criteria values to be mostly aligned with the ZBC
> 	specification
> 3)  Clarified the behaviour of blkdev_query_zones()
> 4)  Added an ioctl interface to expose this functionality to userspace
> 5)  Removed the proposed simplified data variant
> 
> Please let me know what you think!

Should ZBCRESETZONE take a length or number of zones to reset?

Cheers, Andreas

> /*
> * Note: this structure is 24 bytes.  Using 256 MB zones, an 8TB drive
> * will have 32,768 zones.   That means if we tried to use a contiguous
> * array we would need to allocate 768k of contiguous, non-swappable
> * kernel memory.  (Boo, hiss.) 
> *
> * This large enough that it would be painful to hang an array off the
> * block_device structure.  So we will define a function
> * blkdev_query_zones() to selectively return information for some
> * number of zones.
> *
> * It is anticipated that the block device driver will store this
> * information in a compressed form, and that z_checkpoint_offset will
> * not be dynamically tracked.  That is, the checkpoint offset will,
> * if non-zero, indicates that drive suffered a power fail event, and
> * the file system or userspace process may need to implement recovery
> * procedures.  Once the file system or userspace process writes to an
> * SMR band, the checkpoint offset will be cleared and future queries
> * for the SMR band will return the checkpoint offset == write_ptr.
> */
> struct zone_status {
>       sector_t	z_start;
>       __u32	z_length;
>       __u32	z_write_ptr_offset;  /* offset */
>       __u32	z_checkpoint_offset; /* offset */
>       __u32	z_flags;	     /* full, ro, offline, reset_requested */
> };
> 
> #define Z_FLAG_RESET_REQUESTED	0x0001
> #define Z_FLAGS_OFFLINE		0x0002
> #define Z_FLAGS_RO		0x0004
> #define Z_FLAGS_FULL		0x0008
> 
> #define Z_FLAG_TYPE_MASK	0x0F00
> #define Z_FLAG_TYPE_CONVENTIONAL 0x0100
> #define Z_FLAG_TYPE_SEQUENTIAL	0x0200
> 
> 
> /*
>  * Query the block_device bdev for information about the zones
>  * starting at start_sector that match the criteria specified by
>  * free_sectors_criteria.  Zone status information for at most
>  * max_zones will be placed into the memory array ret_zones (which is
>  * allocated by the caller, not by the blkdev_query_zones function),
>  * in ascending LBA order.  The return value will be a kernel error
>  * code if negative, or the number of zones actually returned if
>  * non-nonegative.
>  *
>  * If free_sectors_criteria is positive, then return zones that have
>  * at least that many sectors available to be written.  If it is zero,
>  * then match all zones.  If free_sectors_criteria is negative, then
>  * return the zones that match the following criteria:
>  *
>  *	-1     Match all full zones
>  *	-2     Match all open zones
>  *		(the zone has at least one written sector and is not full)
>  *	-3     Match all free zones
>  *		(the zone has no written sectors)
>  *      -4     Match all read-only zones
>  *      -5     Match all offline zones
>  *      -6     Match all zones where the write ptr != the checkpoint ptr
>  *
>  * The negative values are taken from Table 4 of 14-010r1, with the
>  * exception of -6, which is not in the draft spec --- but IMHO should
>  * be :-) It is anticipated, though, that the kernel will keep this
>  * info in in memory and so will handle matching zones which meet
>  * these criteria itself, without needing to issue a ZBC command for
>  * each call to blkdev_query_zones().
>  */
> extern int blkdev_query_zones(struct block_device *bdev,
> 			      sector_t start_sector,
> 			      int free_sectors_criteria,
> 			      int max_zones,
>       			      struct zone_status *ret_zones);
> 
> /*
>  * Reset the write pointer for a sequential write zone.
>  *
>  * Returns -EINVAL if the start_sector is not the beginning of a
>  * sequential write zone.
>  */
> extern int blkdev_reset_zone_ptr(struct block_dev *bdev,
> 				 sector_t start_sector);
> 
> 
> /* ioctl interface */
> 
> ZBCQUERY
> 	u64 starting_lba	/* IN */
> 	u32 criteria		/* IN */
> 	u32 *num_zones		/* IN/OUT */
> 	struct zone_status *ptr	/* OUT */
> 
> ZBCRESETZONE
> 	u64 starting_lba
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux