On 5/30/20 7:36 AM, Xiaoguang Wang wrote: > hi, > >> On 28/05/2020 12:15, Xiaoguang Wang wrote: >>> If requests can be submitted and completed inline, we don't need to >>> initialize whole io_wq_work in io_init_req(), which is an expensive >>> operation, add a new 'REQ_F_WORK_INITIALIZED' to control whether >>> io_wq_work is initialized. >> >> It looks nicer. Especially if you'd add a helper as Jens supposed. > Sure, I'll add a helper in V4, thanks. > >> >> The other thing, even though I hate treating a part of the fields differently >> from others, I don't like ->creds tossing either. >> >> Did you consider trying using only ->work.creds without adding req->creds? like >> in the untested incremental below. init_io_work() there is misleading, should be >> somehow played around better. > But if not adding a new req->creds, I think there will be some potential risks. > In current io_uring mainline codes, look at io_kiocb's memory layout > crash> struct -o io_kiocb > struct io_kiocb { > union { > [0] struct file *file; > [0] struct io_rw rw; > [0] struct io_poll_iocb poll; > [0] struct io_accept accept; > [0] struct io_sync sync; > [0] struct io_cancel cancel; > [0] struct io_timeout timeout; > [0] struct io_connect connect; > [0] struct io_sr_msg sr_msg; > [0] struct io_open open; > [0] struct io_close close; > [0] struct io_files_update files_update; > [0] struct io_fadvise fadvise; > [0] struct io_madvise madvise; > [0] struct io_epoll epoll; > [0] struct io_splice splice; > [0] struct io_provide_buf pbuf; > }; > [64] struct io_async_ctx *io; > [72] int cflags; > [76] u8 opcode; > [78] u16 buf_index; > [80] struct io_ring_ctx *ctx; > [88] struct list_head list; > [104] unsigned int flags; > [108] refcount_t refs; > [112] struct task_struct *task; > [120] unsigned long fsize; > [128] u64 user_data; > [136] u32 result; > [140] u32 sequence; > [144] struct list_head link_list; > [160] struct list_head inflight_entry; > [176] struct percpu_ref *fixed_file_refs; > union { > struct { > [184] struct callback_head task_work; > [200] struct hlist_node hash_node; > [216] struct async_poll *apoll; > }; > [184] struct io_wq_work work; > }; > } > SIZE: 240 > > struct io_wq_work { > [0] struct io_wq_work_node list; > [8] void (*func)(struct io_wq_work **); > [16] struct files_struct *files; > [24] struct mm_struct *mm; > [32] const struct cred *creds; > [40] struct fs_struct *fs; > [48] unsigned int flags; > [52] pid_t task_pid; > } > SIZE: 56 > > The risk mainly comes from the union: > union { > /* > * Only commands that never go async can use the below fields, > * obviously. Right now only IORING_OP_POLL_ADD uses them, and > * async armed poll handlers for regular commands. The latter > * restore the work, if needed. > */ > struct { > struct callback_head task_work; > struct hlist_node hash_node; > struct async_poll *apoll; > }; > struct io_wq_work work; > }; > > 1, apoll and creds are in same memory offset, for 'async armed poll handlers' case, > apoll will be used, that means creds will be overwrited. In patch "io_uring: avoid > unnecessary io_wq_work copy for fast poll feature", I use REQ_F_WORK_INITIALIZED > to control whether to do io_wq_work restore, then your below codes will break: > > static inline void io_req_work_drop_env(struct io_kiocb *req) > { > /* always init'ed, put before REQ_F_WORK_INITIALIZED check */ > if (req->work.creds) { > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Here req->work.creds will be invalid, or I still need to use some space > to record original req->work.creds, and do creds restore. > > put_cred(req->work.creds); > req->work.creds = NULL; > } > if (!(req->flags & REQ_F_WORK_INITIALIZED)) > return; > > 2, For IORING_OP_POLL_ADD case, current mainline codes will use task_work and hash_node, > 32 bytes, that means io_wq_work's member list, func, files and mm would be overwrited, > but will not touch creds, it's safe now. But if we will add some new member to > struct { > struct callback_head task_work; > struct hlist_node hash_node; > struct async_poll *apoll; > }; > say callback_head adds a new member, our check will still break. > > 3. IMO, io_wq_work is just to describe needed running environment for reqs that will be > punted to io-wq, for reqs submitted and completed inline should not touch this struct > from software design view, and current io_kiocb is 240 bytes, and a new pointer will be > 248 bytes, still 4 cache lines for cache line 64 bytes. I'd say let's do the extra creds member for now. It's safer. We can always look into shrinking down the size and re-using the io_wq_work part. I'd rather get this prepped up and ready for 5.8. Are you sending out a v4 today? -- Jens Axboe