Hi, Networked workloads are intensive on the poll arming side, as most receive operations will be triggered async by poll. For that kind of poll triggering, we have allocated req->apoll dynamically and that serves as our poll entry. This means that the poll->events and poll->head are not part of the io_kiocb cachelines, and hence often not hot in the completion path. When profiling workloads, io_poll_check_events() shows up as hotter than it should be, exactly because we have to pull in this cacheline separately. Cache state in the io_kiocb itself instead, which avoids pulling in unnecessary data in the poll task_work path. This reduces overhead by about 3-4%. -- Jens Axboe