Re: [PATCH v3 1/2] io_uring: change the poll events to be 32-bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/16/20 1:20 PM, Pavel Begunkov wrote:
> On 16/06/2020 21:45, Jens Axboe wrote:
>> On 6/16/20 7:58 AM, Jens Axboe wrote:
>>> On 6/15/20 9:04 PM, Jiufei Xue wrote:
>>>>
>>>>
>>>> On 2020/6/15 下午11:09, Jens Axboe wrote:
>>>>> On 6/14/20 8:49 PM, Jiufei Xue wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>> On 2020/6/13 上午12:48, Jens Axboe wrote:
>>>>>>> On 6/12/20 8:58 AM, Jens Axboe wrote:
>>>>>>>> On 6/11/20 8:30 PM, Jiufei Xue wrote:
>>>>>>>>> poll events should be 32-bits to cover EPOLLEXCLUSIVE.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jiufei Xue <jiufei.xue@xxxxxxxxxxxxxxxxx>
>>>>>>>>> ---
>>>>>>>>>  fs/io_uring.c                 | 4 ++--
>>>>>>>>>  include/uapi/linux/io_uring.h | 2 +-
>>>>>>>>>  tools/io_uring/liburing.h     | 2 +-
>>>>>>>>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>>>> index 47790a2..6250227 100644
>>>>>>>>> --- a/fs/io_uring.c
>>>>>>>>> +++ b/fs/io_uring.c
>>>>>>>>> @@ -4602,7 +4602,7 @@ static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
>>>>>>>>>  static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>>>  {
>>>>>>>>>  	struct io_poll_iocb *poll = &req->poll;
>>>>>>>>> -	u16 events;
>>>>>>>>> +	u32 events;
>>>>>>>>>  
>>>>>>>>>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>>>  		return -EINVAL;
>>>>>>>>> @@ -8196,7 +8196,7 @@ static int __init io_uring_init(void)
>>>>>>>>>  	BUILD_BUG_SQE_ELEM(28, /* compat */   int, rw_flags);
>>>>>>>>>  	BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags);
>>>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  fsync_flags);
>>>>>>>>> -	BUILD_BUG_SQE_ELEM(28, __u16,  poll_events);
>>>>>>>>> +	BUILD_BUG_SQE_ELEM(28, __u32,  poll_events);
>>>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  sync_range_flags);
>>>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  msg_flags);
>>>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  timeout_flags);
>>>>>>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>>>>>>> index 92c2269..afc7edd 100644
>>>>>>>>> --- a/include/uapi/linux/io_uring.h
>>>>>>>>> +++ b/include/uapi/linux/io_uring.h
>>>>>>>>> @@ -31,7 +31,7 @@ struct io_uring_sqe {
>>>>>>>>>  	union {
>>>>>>>>>  		__kernel_rwf_t	rw_flags;
>>>>>>>>>  		__u32		fsync_flags;
>>>>>>>>> -		__u16		poll_events;
>>>>>>>>> +		__u32		poll_events;
>>>>>>>>>  		__u32		sync_range_flags;
>>>>>>>>>  		__u32		msg_flags;
>>>>>>>>>  		__u32		timeout_flags;
>>>>>>>>
>>>>>>>> We obviously have the space in there as most other flag members are 32-bits, but
>>>>>>>> I'd want to double check if we're not changing the ABI here. Is this always
>>>>>>>> going to be safe, on any platform, regardless of endianess etc?
>>>>>>>
>>>>>>> Double checked, and as I feared, we can't safely do this. We'll have to
>>>>>>> do something like the below, grabbing an unused bit of the poll mask
>>>>>>> space and if that's set, then store the fact that EPOLLEXCLUSIVE is set.
>>>>>>> So probably best to turn this just into one patch, since it doesn't make
>>>>>>> a lot of sense to do it as a prep patch at that point.
>>>>>>>
>>>>>> Yes, Agree about that. But I also fear that if the unused bit is used
>>>>>> in the feature, it will bring unexpected behavior.
>>>>>
>>>>> Yeah, it's certainly not the prettiest and could potentially be fragile.
>>>>> I'm open to suggestions, we need some way of signaling that the 32-bit
>>>>> variant of the poll_events should be used. We could potentially make
>>>>> this work by doing explicit layout for big endian vs little endian, that
>>>>> might be prettier and wouldn't suffer from the "grab some random bit"
>>>>> issue.
>>>>>
>>>> Thank you for your suggestion, I will think about it.
>>>>
>>>>>>> This does have the benefit of not growing io_poll_iocb. With your patch,
>>>>>>> it'd go beyond a cacheline, and hence bump the size of the entire
>>>>>>> io_iocb as well, which would be very unfortunate.
>>>>>>>
>>>>>> events in io_poll_iocb is 32-bits already, so why it will bump the
>>>>>> size of the io_iocb structure with my patch? 
>>>>>
>>>>> It's not 32-bits already, it's a __poll_t type which is 16-bits only.
>>>>>
>>>> Yes, it is a __poll_t type, but I found that __poll_t type is 32-bits
>>>> with the definition below:
>>>>
>>>> typedef unsigned __bitwise __poll_t;
>>>>
>>>> And I also investigate it with crash:
>>>> crash> io_poll_iocb -ox
>>>> struct io_poll_iocb {
>>>>    [0x0] struct file *file;
>>>>          union {
>>>>    [0x8]     struct wait_queue_head *head;
>>>>    [0x8]     u64 addr;
>>>>          };
>>>>   [0x10] __poll_t events;
>>>>   [0x14] bool done;
>>>>   [0x15] bool canceled;
>>>>   [0x18] struct wait_queue_entry wait;
>>>> }
>>>
>>> Yeah you're right, not sure why I figured it was 16-bits. But just
>>> checking on my default build:
>>>
>>> axboe@x1 ~/gi/linux-block (block-5.8)> pahole -C io_poll_iocb fs/io_uring.o
>>> struct io_poll_iocb {
>>> 	struct file *              file;                 /*     0     8 */
>>> 	union {
>>> 		struct wait_queue_head * head;           /*     8     8 */
>>> 		u64                addr;                 /*     8     8 */
>>> 	};                                               /*     8     8 */
>>> 	__poll_t                   events;               /*    16     4 */
>>> 	bool                       done;                 /*    20     1 */
>>> 	bool                       canceled;             /*    21     1 */
>>>
>>> 	/* XXX 2 bytes hole, try to pack */
>>>
>>> 	struct wait_queue_entry wait;                    /*    24    40 */
>>>
>>> 	/* size: 64, cachelines: 1, members: 6 */
>>> 	/* sum members: 62, holes: 1, sum holes: 2 */
>>> };
>>>
>>> and it's definitely 64-bytes in total size (as it should be), and the
>>> 'events' is indeed 32-bits as well.
>>>
>>> So at least that's good, we don't need anything extra in there. If we
>>> can solve the endian issue, then it should be trivial to use the full
>>> 32-bits for the flags in the sqe.
>>
>> To get back to something that can move us forward, something like the
>> below I _think_ will work. Then applications just use poll32_events and
>> we don't have to handle this in any special way. Care to base on that
>> and re-send the change?
>>
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index 92c22699a5a7..5b3f6bd59437 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -31,7 +31,16 @@ struct io_uring_sqe {
>>  	union {
>>  		__kernel_rwf_t	rw_flags;
>>  		__u32		fsync_flags;
>> -		__u16		poll_events;
>> +		struct {
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> +			__u16	__unused_poll;
>> +			__u16	poll_events;
>> +#else
>> +			__u16	poll_events;
>> +			__u16	__unused_poll;
>> +#endif
>> +		};
>> +		__u32		poll32_events;
>>  		__u32		sync_range_flags;
>>  		__u32		msg_flags;
>>  		__u32		timeout_flags;
>>
> 
> That changes layout for big endian, isn't it? It's not ABI compatible.

Right, see followup message, mid-air collision on that one.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux