On 1/11/24 6:12 PM, Xiaobing Li wrote: > On 1/10/24 16:15 AM, Jens Axboe wrote: >> On 1/10/24 2:05 AM, Xiaobing Li wrote: >>> On 1/5/24 04:02 AM, Pavel Begunkov wrote: >>>> On 1/3/24 05:49, Xiaobing Li wrote: >>>>> On 12/30/23 9:27 AM, Pavel Begunkov wrote: >>>>>> Why it uses jiffies instead of some task run time? >>>>>> Consequently, why it's fine to account irq time and other >>>>>> preemption? (hint, it's not) >>>>>> >>>>>> Why it can't be done with userspace and/or bpf? Why >>>>>> can't it be estimated by checking and tracking >>>>>> IORING_SQ_NEED_WAKEUP in userspace? >>>>>> >>>>>> What's the use case in particular? Considering that >>>>>> one of the previous revisions was uapi-less, something >>>>>> is really fishy here. Again, it's a procfs file nobody >>>>>> but a few would want to parse to use the feature. >>>>>> >>>>>> Why it just keeps aggregating stats for the whole >>>>>> life time of the ring? If the workload changes, >>>>>> that would either totally screw the stats or would make >>>>>> it too inert to be useful. That's especially relevant >>>>>> for long running (days) processes. There should be a >>>>>> way to reset it so it starts counting anew. >>>>> >>>>> Hi, Jens and Pavel, >>>>> I carefully read the questions you raised. >>>>> First of all, as to why I use jiffies to statistics time, it >>>>> is because I have done some performance tests and found that >>>>> using jiffies has a relatively smaller loss of performance >>>>> than using task run time. Of course, using task run time is >>>> >>>> How does taking a measure for task runtime looks like? I expect it to >>>> be a simple read of a variable inside task_struct, maybe with READ_ONCE, >>>> in which case the overhead shouldn't be realistically measurable. Does >>>> it need locking? >>> >>> The task runtime I am talking about is similar to this: >>> start = get_system_time(current); >>> do_io_part(); >>> sq->total_time += get_system_time(current) - start; >> >> Not sure what get_system_time() is, don't see that anywhere. >> >>> Currently, it is not possible to obtain the execution time of a piece of >>> code by a simple read of a variable inside task_struct. >>> Or do you have any good ideas? >> >> I must be missing something, because it seems like all you need is to >> read task->stime? You could possible even make do with just logging busy >> loop time, as getrusage(RUSAGE_THREAD, &stat) from userspace would then >> give you the total time. >> >> stat.ru_stime would then be the total time, the thread ran, and >> 1 - (above_busy_stime / stat.ru_stime) would give you the time the >> percentage of time the thread ran and did useful work (eg not busy >> looping. > > getrusage can indeed get the total time of the thread, but this > introduces an extra function call, which is relatively more > complicated than defining a variable. In fact, recording the total > time of the loop and the time of processing the IO part can achieve > our observation purpose. Recording only two variables will have less > impact on the existing performance, so why not choose a simpler and > effective method. I'm not opposed to exposing both of them, it does make the API simpler. If we can call it an API... I think the main point was using task->stime for it rather than jiffies etc. -- Jens Axboe