On 10/24/19 6:24 AM, Douglas Gilbert wrote: > On 2019-10-18 6:25 a.m., Hannes Reinecke wrote: >> On 10/8/19 9:50 AM, Douglas Gilbert wrote: >>> The introduced request state machine is not wired in so that >>> the size of one of the following patches is reduced. Bit >>> operation defines for the request and file descriptor level >>> are also introduced. Minor rework og sg_rd_append() function. >>> >>> Signed-off-by: Douglas Gilbert <dgilbert@xxxxxxxxxxxx> >>> --- >>> drivers/scsi/sg.c | 237 ++++++++++++++++++++++++++++++++++------------ >>> 1 file changed, 175 insertions(+), 62 deletions(-) >>> >>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c >>> index 5b560f720993..4e6f6fb2a54e 100644 >>> --- a/drivers/scsi/sg.c >>> +++ b/drivers/scsi/sg.c >>> @@ -75,7 +75,43 @@ static char *sg_version_date = "20190606"; >>> #define uptr64(val) ((void __user *)(uintptr_t)(val)) >>> #define cuptr64(val) ((const void __user *)(uintptr_t)(val)) >>> +/* Following enum contains the states of sg_request::rq_st */ >>> +enum sg_rq_state { >>> + SG_RS_INACTIVE = 0, /* request not in use (e.g. on fl) */ >>> + SG_RS_INFLIGHT, /* active: cmd/req issued, no response >>> yet */ >>> + SG_RS_AWAIT_RD, /* response received, awaiting read */ >>> + SG_RS_DONE_RD, /* read is ongoing or done */ >>> + SG_RS_BUSY, /* temporary state should rarely be seen */ >>> +}; >>> + >>> +#define SG_TIME_UNIT_MS 0 /* milliseconds */ >>> +#define SG_DEF_TIME_UNIT SG_TIME_UNIT_MS >>> #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, >>> USER_HZ) >>> +#define SG_FD_Q_AT_HEAD 0 >>> +#define SG_DEFAULT_Q_AT SG_FD_Q_AT_HEAD /* for backward >>> compatibility */ >>> +#define SG_FL_MMAP_DIRECT (SG_FLAG_MMAP_IO | SG_FLAG_DIRECT_IO) >>> + >>> +/* Only take lower 4 bits of driver byte, all host byte and sense >>> byte */ >>> +#define SG_ML_RESULT_MSK 0x0fff00ff /* mid-level's 32 bit result >>> value */ >>> + >>> +#define SG_PACK_ID_WILDCARD (-1) >>> + >>> +#define SG_ADD_RQ_MAX_RETRIES 40 /* to stop infinite _trylock(s) */ >>> + >>> +/* Bit positions (flags) for sg_request::frq_bm bitmask follow */ >>> +#define SG_FRQ_IS_ORPHAN 1 /* owner of request gone */ >>> +#define SG_FRQ_SYNC_INVOC 2 /* synchronous (blocking) >>> invocation */ >>> +#define SG_FRQ_DIO_IN_USE 3 /* false->indirect_IO,mmap; 1->dio */ >>> +#define SG_FRQ_NO_US_XFER 4 /* no user space transfer of data */ >>> +#define SG_FRQ_DEACT_ORPHAN 7 /* not keeping orphan so >>> de-activate */ >>> +#define SG_FRQ_BLK_PUT_REQ 9 /* set when blk_put_request() >>> called */ >>> + >>> +/* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */ >>> +#define SG_FFD_FORCE_PACKID 0 /* receive only given >>> pack_id/tag */ >>> +#define SG_FFD_CMD_Q 1 /* clear: only 1 active req per fd */ >>> +#define SG_FFD_KEEP_ORPHAN 2 /* policy for this fd */ >>> +#define SG_FFD_MMAP_CALLED 3 /* mmap(2) system call made on fd */ >>> +#define SG_FFD_Q_AT_TAIL 5 /* set: queue reqs at tail of blk q */ >>> /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */ >>> #define SG_FDEV_EXCLUDE 0 /* have fd open with O_EXCL */ >>> @@ -83,12 +119,11 @@ static char *sg_version_date = "20190606"; >>> #define SG_FDEV_LOG_SENSE 2 /* set by ioctl(SG_SET_DEBUG) */ >>> int sg_big_buff = SG_DEF_RESERVED_SIZE; >>> -/* N.B. This variable is readable and writeable via >>> - /proc/scsi/sg/def_reserved_size . Each time sg_open() is called a >>> buffer >>> - of this size (or less if there is not enough memory) will be >>> reserved >>> - for use by this file descriptor. [Deprecated usage: this variable >>> is also >>> - readable via /proc/sys/kernel/sg-big-buff if the sg driver is >>> built into >>> - the kernel (i.e. it is not a module).] */ >>> +/* >>> + * This variable is accessible via /proc/scsi/sg/def_reserved_size . >>> Each >>> + * time sg_open() is called a sg_request of this size (or less if >>> there is >>> + * not enough memory) will be reserved for use by this file descriptor. >>> + */ >>> static int def_reserved_size = -1; /* picks up init parameter */ >>> static int sg_allow_dio = SG_ALLOW_DIO_DEF; >>> @@ -132,6 +167,7 @@ struct sg_request { /* SG_MAX_QUEUE requests >>> outstanding per file */ >>> char sg_io_owned; /* 1 -> packet belongs to SG_IO */ >>> /* done protected by rq_list_lock */ >>> char done; /* 0->before bh, 1->before read, 2->read */ >>> + atomic_t rq_st; /* request state, holds a enum >>> sg_rq_state */ >>> struct request *rq; /* released in sg_rq_end_io(), bio kept */ >>> struct bio *bio; /* kept until this req -->SG_RS_INACTIVE */ >>> struct execute_work ew_orph; /* harvest orphan request */ >>> @@ -208,10 +244,15 @@ static void sg_unlink_reserve(struct sg_fd >>> *sfp, struct sg_request *srp); >>> static struct sg_fd *sg_add_sfp(struct sg_device *sdp); >>> static void sg_remove_sfp(struct kref *); >>> static struct sg_request *sg_add_request(struct sg_fd *sfp); >>> -static int sg_deact_request(struct sg_fd *sfp, struct sg_request *srp); >>> +static void sg_deact_request(struct sg_fd *sfp, struct sg_request >>> *srp); >>> static struct sg_device *sg_get_dev(int dev); >>> static void sg_device_destroy(struct kref *kref); >>> static void sg_calc_sgat_param(struct sg_device *sdp); >>> +static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str); >>> +static void sg_rep_rq_state_fail(struct sg_fd *sfp, >>> + enum sg_rq_state exp_old_st, >>> + enum sg_rq_state want_st, >>> + enum sg_rq_state act_old_st); >>> #define SZ_SG_HEADER ((int)sizeof(struct sg_header)) /* v1 and >>> v2 header */ >>> #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr)) /* v3 >>> header */ >>> @@ -219,6 +260,8 @@ static void sg_calc_sgat_param(struct sg_device >>> *sdp); >>> #define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, >>> (sdp)->fdev_bm) >>> #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm) >>> +#define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != >>> SG_RS_INACTIVE) >>> +#define SG_RS_AWAIT_READ(srp) (atomic_read(&(srp)->rq_st) == >>> SG_RS_AWAIT_RD) >>> /* >>> * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log >>> messages. >>> @@ -382,15 +425,6 @@ sg_open(struct inode *inode, struct file *filp) >>> res = sg_allow_if_err_recovery(sdp, non_block); >>> if (res) >>> goto error_out; >>> - /* scsi_block_when_processing_errors() may block so bypass >>> - * check if O_NONBLOCK. Permits SCSI commands to be issued >>> - * during error recovery. Tread carefully. */ >>> - if (!((op_flags & O_NONBLOCK) || >>> - scsi_block_when_processing_errors(sdp->device))) { >>> - res = -ENXIO; >>> - /* we are in error recovery for this device */ >>> - goto error_out; >>> - } >>> mutex_lock(&sdp->open_rel_lock); >>> if (op_flags & O_NONBLOCK) { >>> @@ -494,12 +528,12 @@ sg_write(struct file *filp, const char __user >>> *p, size_t count, loff_t *ppos) >>> struct sg_device *sdp; >>> struct sg_fd *sfp; >>> struct sg_request *srp; >>> + u8 cmnd[SG_MAX_CDB_SIZE]; >>> struct sg_header ov2hdr; >>> struct sg_io_hdr v3hdr; >>> struct sg_header *ohp = &ov2hdr; >>> struct sg_io_hdr *h3p = &v3hdr; >>> struct sg_comm_wr_t cwr; >>> - u8 cmnd[SG_MAX_CDB_SIZE]; >>> res = sg_check_file_access(filp, __func__); >>> if (res) >>> @@ -748,11 +782,25 @@ sg_common_write(struct sg_fd *sfp, struct >>> sg_comm_wr_t *cwrp) >>> return 0; >>> } >>> +static inline int >>> +sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st, >>> + enum sg_rq_state new_st) >>> +{ >>> + enum sg_rq_state act_old_st = (enum sg_rq_state) >>> + atomic_cmpxchg(&srp->rq_st, old_st, new_st); >>> + >>> + if (act_old_st == old_st) >>> + return 0; /* implies new_st --> srp->rq_st */ >>> + else if (IS_ENABLED(CONFIG_SCSI_LOGGING)) >>> + sg_rep_rq_state_fail(srp->parentfp, old_st, new_st, >>> + act_old_st); >>> + return -EPROTOTYPE; >>> +} >>> >> -EPROTOTYPE? >> Now there someone has read POSIX ... why not simply -EINVAL as one would >> expect? > > I expect EINVAL from very shallow errors, like sanity checks by > ioctl(SG_IOSUBMIT) or write(2) before they issue a command/request to the > lower levels. > > This however is a lot more interesting. It is potentially two readers in > a race, and if the race is close, the loser gets this error. Depending on > the context, the user will either see this error, or EBUSY. There is an > inherent race in the read/receive side of all AIO designs, as far as I > can determine. Such a race is advised against in my documentation, but > if a user, accidentally or otherwise, generates that race, then IMO > the driver needs to handle it. By "handle it" I don't mean trying to help > such users, I mean to protect the driver and the kernel behind it. > > A later patch (46/62 currently) adds a 28 entry table of errno values > with their meaning if returned by this driver: > .... > EPROTOTYPE atomic state change failed unexpectedly > .... > Ah, right. Fine. Reviewed-by: Hannes Reinecke <hare@xxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 247165 (AG München), GF: Felix Imendörffer