Hi On Fri, Jun 20, 2014 at 4:49 PM, Benjamin Tissoires <benjamin.tissoires@xxxxxxxxx> wrote: > Hi David, > > On Fri, Jun 20, 2014 at 10:05 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote: >> When we introduced the slotted MT ABS extensions, we didn't take care to >> make SYN_DROPPED recoverable. Imagine a client recevies a SYN_DROPPED and >> syncs its current state via EVIOCGABS. It has to call this ioctl for each >> and every ABS code separately. Besides being horribly slow, this series >> of ioctl-calls is not atomic. The kernel might queue new ABS events while >> the client fetches the data. >> >> Now for normal ABS codes this is negligible as ABS values provide absolute >> data. That is, there is usually no need to resync ABS codes as we don't >> need previous values to interpret the next ABS code. Furthermore, most ABS >> codes are also sent pretty frequently so a refresh is usually useless. >> >> However, with the introduction of slotted ABS axes we added a relative >> component: ABS_MT_SLOT. If a client syncs all ABS codes via EVIOCGABS >> while the kernel has ABS-events and an ABS_MT_SLOT event queued, the >> client will think any read ABS-event is for the retrieved SLOT, however, >> this is not true as all events until the next ABS_MT_SLOT event are for >> the previously active slot: >> >> Kernel queue is: { ABS_DROPPED, >> ABS_MT_POSITION_X(slot: 0), >> ABS_MT_SLOT(slot: 1), >> ABS_MT_POSITION_X(slot: 1) } >> Client reads ABS_DROPPED from queue. >> Client syncs all ABS values: >> As part of that, client syncs ABS_MT_SLOT, which is 1 in the current >> view of the kernel. >> Client reads ABS_MT_POSITION_X and attributes it to slot 1 instead of >> slot 0, as the slot-value is not explicit. >> >> This is just a simple example how the relative information provided by the >> ABS_MT_SLOT axis can be problematic to clients. >> >> Now there are many ways to fix this: >> * Make ABS_MT_SLOT a per-evdev-client attribute. On each >> EVIOCGABS(ABS_MT_SLOT) we can add fake ABS_MT_SLOT events to the queue. >> => Ugly and overkill >> * Flush all ABS events when clients read ABS_MT_SLOT. >> => Ugly hack and client might loose important ABS_MT_* events >> * Provide atomic EVIOCGABS API. >> => Sounds good! >> >> This patch introduces EVIOCGABSRANGE. Unlike EVIOCGABS, this ioctl only >> fetches ABS values, rather than the whole "struct input_absinfo" set. >> However, the new ioctl can fetch a range of ABS axes atomically and will >> flush matching events from the client's receive queue. Moreover, you can >> fetch all axes for *all* slots with a single call. >> >> This way, a client can simply run EVIOCGABSRANGE(0, ABS_CNT) and it will >> receive a consistent view of the whole ABS state, while the kernel flushes >> the receive-buffer for a consistent view. >> While most clients probably only need >> EVIOCGABSRANGE(ABS_MT_SLOT, ABS_MT_TOOL_y - ABS_MT_SLOT + 1), the ioctl >> allows to receive an arbitrary range of axes. >> >> Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx> >> --- > > Thanks for this. This sounds very promising. I did not went into the > tiny details of the code, but I already have some comments. See below. Thanks for the comments! >> drivers/input/evdev.c | 180 ++++++++++++++++++++++++++++++++++++++++++++- >> include/uapi/linux/input.h | 42 +++++++++++ >> 2 files changed, 218 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c >> index 6386882..7a25a7a 100644 >> --- a/drivers/input/evdev.c >> +++ b/drivers/input/evdev.c >> @@ -175,8 +175,9 @@ static bool __evdev_is_filtered(struct evdev_client *client, >> return mask && !test_bit(code, mask); >> } >> >> -/* flush queued events of type @type, caller must hold client->buffer_lock */ >> -static void __evdev_flush_queue(struct evdev_client *client, unsigned int type) >> +/* flush queued, matching events, caller must hold client->buffer_lock */ >> +static void __evdev_flush_queue(struct evdev_client *client, unsigned int type, >> + unsigned int code_first, unsigned int code_last) >> { >> unsigned int i, head, num; >> unsigned int mask = client->bufsize - 1; >> @@ -195,7 +196,9 @@ static void __evdev_flush_queue(struct evdev_client *client, unsigned int type) >> ev = &client->buffer[i]; >> is_report = ev->type == EV_SYN && ev->code == SYN_REPORT; >> >> - if (ev->type == type) { >> + if (ev->type == type && >> + ev->code >= code_first && >> + ev->code <= code_last) { >> /* drop matched entry */ >> continue; >> } else if (is_report && !num) { >> @@ -786,6 +789,172 @@ static int handle_eviocgbit(struct input_dev *dev, >> return bits_to_user(bits, len, size, p, compat_mode); >> } >> >> +static inline void free_absrange(s32 **pages, size_t page_cnt) >> +{ >> + if (page_cnt > 1) { >> + while (page_cnt > 0) { >> + if (!pages[--page_cnt]) >> + break; >> + __free_page(virt_to_page(pages[page_cnt])); >> + } >> + kfree(pages); >> + } else if (page_cnt == 1) { >> + kfree(pages); >> + } >> +} >> + >> +static inline s32 *absrange_ptr(s32 **pages, size_t page_cnt, size_t slots, >> + size_t i_code, size_t j_slot) >> +{ >> + size_t idx, off; >> + >> + idx = (i_code * slots + j_slot) / (PAGE_SIZE / sizeof(s32)); >> + off = (i_code * slots + j_slot) % (PAGE_SIZE / sizeof(s32)); >> + >> + if (page_cnt == 1) >> + return &((s32*)pages)[off]; >> + else >> + return &pages[idx][off]; >> +} >> + >> +static inline ssize_t fetch_absrange(struct evdev_client *client, >> + struct input_dev *dev, size_t start, >> + size_t count, size_t slots, s32 ***out) >> +{ >> + size_t size, page_cnt, i, j; >> + unsigned long flags; >> + s32 **pages; >> + >> + /* >> + * Fetch data atomically from the device and flush buffers. We need to >> + * allocate a temporary buffer as copy_to_user() is not allowed while >> + * holding spinlocks. However, to-be-copied data might be huge and >> + * high-order allocations should be avoided. Therefore, do the >> + * page-allocation manually. >> + */ >> + >> + BUILD_BUG_ON(PAGE_SIZE % sizeof(s32) != 0); >> + >> + size = sizeof(s32) * count * slots; >> + page_cnt = DIV_ROUND_UP(size, PAGE_SIZE); >> + if (page_cnt < 1) { >> + return 0; >> + } else if (page_cnt == 1) { >> + pages = kzalloc(size, GFP_TEMPORARY); >> + if (!pages) >> + return -ENOMEM; >> + } else { >> + pages = kzalloc(sizeof(*pages) * page_cnt, GFP_TEMPORARY); >> + if (!pages) >> + return -ENOMEM; >> + >> + for (i = 0; i < page_cnt; ++i) { >> + pages[i] = (void*)get_zeroed_page(GFP_TEMPORARY); >> + if (!pages[i]) { >> + free_absrange(pages, page_cnt); >> + return -ENOMEM; >> + } >> + } >> + } >> + >> + spin_lock_irqsave(&dev->event_lock, flags); >> + spin_lock(&client->buffer_lock); >> + >> + for (i = 0; i < count; ++i) { >> + __u16 code; >> + bool is_mt; >> + >> + code = start + i; >> + is_mt = input_is_mt_value(code); >> + if (is_mt && !dev->mt) >> + continue; >> + >> + for (j = 0; j < slots; ++j) { >> + __s32 v; >> + >> + if (is_mt) >> + v = input_mt_get_value(&dev->mt->slots[j], >> + code); >> + else >> + v = dev->absinfo[code].value; >> + >> + *absrange_ptr(pages, page_cnt, slots, i, j) = v; >> + >> + if (!is_mt) >> + break; >> + } >> + } >> + >> + spin_unlock(&client->buffer_lock); >> + __evdev_flush_queue(client, EV_ABS, start, start + count - 1); >> + spin_unlock_irqrestore(&dev->event_lock, flags); >> + >> + *out = pages; >> + return page_cnt; >> +} >> + >> +static int evdev_handle_get_absrange(struct evdev_client *client, >> + struct input_dev *dev, >> + struct input_absrange __user *p) >> +{ >> + size_t slots, code, count, i, j; >> + struct input_absrange absbuf; >> + s32 **vals = NULL; >> + ssize_t val_cnt; >> + s32 __user *b; >> + int retval; >> + >> + if (!dev->absinfo) >> + return -EINVAL; >> + if (copy_from_user(&absbuf, p, sizeof(absbuf))) >> + return -EFAULT; >> + >> + slots = min_t(size_t, dev->mt ? dev->mt->num_slots : 1, absbuf.slots); >> + code = min_t(size_t, absbuf.code, ABS_CNT); >> + count = min_t(size_t, absbuf.count, ABS_CNT); >> + >> + /* first fetch data atomically from device */ >> + >> + if (code + count > ABS_CNT) >> + count = ABS_CNT - code; >> + >> + if (!slots || !count) { >> + val_cnt = 0; >> + } else { >> + val_cnt = fetch_absrange(client, dev, code, count, >> + slots, &vals); >> + if (val_cnt < 0) >> + return val_cnt; >> + } >> + >> + /* now copy data to user-space */ >> + >> + b = (void __user*)(unsigned long)absbuf.buffer; > > What if the user space buffer is not long enough, or if the pointer is > invalid? Is there any means from the kernel to guarantee that we are > not writing in a restricted memory area or in a buffer not owned by > the process? The buffer is given as pointer in absbuf.buffer, the size of the buffer is given as absbuf.slots * absbuf.count. This is the same as for the read() syscall, where we get a pointer plus the size. I use put_user() to write data into that buffer, so in case it's not valid user-memory, this will fail with -EFAULT. This is no different from other calls that return data, apart from splitting the size into two ints "slots" and "count". So I cannot follow what you mean? Note that "put_user()" is equivalent to "copy_to_user()", maybe you missed that part? >> + for (i = 0; i < absbuf.count; ++i) { >> + for (j = 0; j < absbuf.slots; ++j, ++b) { >> + s32 v; >> + >> + if (i >= count || j >= slots) >> + v = 0; >> + else >> + v = *absrange_ptr(vals, val_cnt, slots, i, j); >> + >> + if (put_user(v, b)) { >> + retval = -EFAULT; >> + goto out; >> + } >> + } >> + } >> + > > Shouldn't we also call free_absrange(vals, val_cnt); before returning? I do. There's a fall-through to "out:", so we always free the buffer. Otherwise, I would have called it "error:" :) >> + retval = 0; > > Not sure it matters a lot, but returning the size of what has been > written is more common. This would make sense if the buffer is not > long enough or if it is too big. This would always be "absbuf.slots * absbuf.count". I don't think there's much gain in returning a constant size, is there? Thanks! David > >> + >> +out: >> + free_absrange(vals, val_cnt); >> + if (retval < 0) >> + evdev_queue_syn_dropped(client); >> + return retval; >> +} >> + >> static int evdev_handle_get_keycode(struct input_dev *dev, void __user *p) >> { >> struct input_keymap_entry ke = { >> @@ -889,7 +1058,7 @@ static int evdev_handle_get_val(struct evdev_client *client, >> >> spin_unlock(&dev->event_lock); >> >> - __evdev_flush_queue(client, type); >> + __evdev_flush_queue(client, type, 0, UINT_MAX); >> >> spin_unlock_irq(&client->buffer_lock); >> >> @@ -1006,6 +1175,9 @@ static long evdev_do_ioctl(struct file *file, unsigned int cmd, >> else >> return evdev_revoke(evdev, client, file); >> >> + case EVIOCGABSRANGE: >> + return evdev_handle_get_absrange(client, dev, p); >> + >> case EVIOCGMASK: >> if (copy_from_user(&mask, p, sizeof(mask))) >> return -EFAULT; >> diff --git a/include/uapi/linux/input.h b/include/uapi/linux/input.h >> index f6ace0e..32a6443 100644 >> --- a/include/uapi/linux/input.h >> +++ b/include/uapi/linux/input.h >> @@ -210,6 +210,48 @@ struct input_mask { >> */ >> #define EVIOCSMASK _IOW('E', 0x93, struct input_mask) /* Set event-masks */ >> >> +struct input_absrange { >> + __u16 slots; >> + __u16 code; >> + __u32 count; >> + __u64 buffer; >> +}; >> + >> +/** >> + * EVIOCGABSRANGE - Fetch range of ABS values >> + * >> + * This fetches the current values of a range of ABS codes atomically. The range >> + * of codes to fetch and the buffer-types are passed as "struct input_absrange", >> + * which has the following fields: >> + * slots: Number of MT slots to fetch data for. >> + * code: First ABS axis to query. >> + * count: Number of ABS axes to query starting at @code. >> + * buffer: Pointer to a receive buffer where to store the fetched ABS >> + * values. This buffer must be an array of __s32 with at least >> + * (@slots * @code) elements. The buffer is interpreted as two >> + * dimensional __s32 array, declared as: __s32[slots][codes] >> + * >> + * Compared to EVIOCGABS this ioctl allows to retrieve a range of ABS codes >> + * atomically regarding any concurrent buffer modifications. Furthermore, any >> + * pending events for codes that were retrived via this call are flushed from >> + * the client's receive buffer. But unlike EVIOCGABS, this ioctl only returns >> + * the current value of an axis, rather than the whole "struct input_absinfo" >> + * set. All fields of "struct input_absinfo" except for the value are constant, >> + * though. >> + * >> + * The kernel's current view of the ABS axes is copied into the provided buffer. >> + * If an ABS axis is not enabled on the device, its value will be zero. Also, if >> + * an axis is not a slotted MT-axis, values for all but the first slot will be >> + * 0. If @slots is greater than the actual number of slots provided by the >> + * device, values for all slots higher than that will be 0. >> + * >> + * This call may fail with -EINVAL if the kernel doesn't support this call or >> + * the arguments are invalid, with -ENODEV if access was revoked, -ENOMEM if the >> + * kernel couldn't allocate temporary buffers for data-copy or -EFAULT if the >> + * passed pointer was invalid. >> + */ >> +#define EVIOCGABSRANGE _IOR('E', 0x94, struct input_absrange) >> + >> #define EVIOCSCLOCKID _IOW('E', 0xa0, int) /* Set clockid to be used for timestamps */ >> >> /* >> -- >> 2.0.0 >> -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html