On Wed, Apr 19, 2023 at 04:15:50PM -0300, Marcelo Tosatti wrote: > On Wed, Apr 19, 2023 at 06:47:30PM +0200, Vlastimil Babka wrote: > > On 4/19/23 13:29, Marcelo Tosatti wrote: > > > On Wed, Apr 19, 2023 at 08:14:09AM -0300, Marcelo Tosatti wrote: > > >> This was tried before: > > >> https://lore.kernel.org/lkml/20220127173037.318440631@fedora.localdomain/ > > >> > > >> My conclusion from that discussion (and work) is that a special system > > >> call: > > >> > > >> 1) Does not allow the benefits to be widely applied (only modified > > >> applications will benefit). Is not portable across different operating systems. > > >> > > >> Removing the vmstat_work interruption is a benefit for HPC workloads, > > >> for example (in fact, it is a benefit for any kind of application, > > >> since the interruption causes cache misses). > > >> > > >> 2) Increases the system call cost for applications which would use > > >> the interface. > > >> > > >> So avoiding the vmstat_update update interruption, without userspace > > >> knowledge and modifications, is a better than solution than a modified > > >> userspace. > > > > > > Another important point is this: if an application dirties > > > its own per-CPU vmstat cache, while performing a system call, > > > and a vmstat sync event is triggered on a different CPU, you'd have to: > > > > > > 1) Wait for that CPU to return to userspace and sync its stats > > > (unfeasible). > > > > > > 2) Queue work to execute on that CPU (undesirable, as that causes > > > an interruption). > > > > So you're saying the application might do a syscall from the isolcpu, so > > IIUC it cannot expect any latency guarantees at that very moment, > > Why not? cyclictest uses nanosleep and its the main tool for measuring > latency. > > > but then > > it immediately starts expecting them again after returning to userspace, > > No, the expectation more generally is this: > > For certain types of applications (for example PLC software or > RAN processing), upon occurrence of an event, it is necessary to > complete a certain task in a maximum amount of time (deadline). > > One way to express this requirement is with a pair of numbers, > deadline time and execution time, where: > > * deadline time: length of time between event and deadline. > * execution time: length of time it takes for processing of event > to occur on a particular hardware platform > (uninterrupted). > > The particular values depend on use-case. For the case > where the realtime application executes in a virtualized > guest, an interruption which must be serviced in the host will cause > the following sequence of events: > > 1) VM-exit > 2) execution of IPI (and function call) (or switch to kwork > thread to execute some work item). > 3) VM-entry > > Which causes an excess of 50us latency as observed by cyclictest > (this violates the latency requirement of vRAN application with 1ms TTI, > for example). > > > and > > a single interruption for a one-time flush after the syscall would be too > > intrusive? > > Generally, if you can't complete the task (which involves executing a > number of instructions) before the deadline, then its a problem. > > One-time flush? You mean to switch between: > > rt-app -> kworker (to execute vmstat_update flush) -> rt-app > > My measurement, which probably had vmstat_update code/data in cache, took 7us. > It might be the case that the code to execute must be brought in from > memory, which takes even longer. > > > (elsewhere in the thread you described an RT app initialization that may > > generate vmstats to flush and then entry userspace loop, again, would a > > single interruption soon after entering the loop be so critical?) > > 1) It depends on the application. For the use-case above, where < 50us > interruption is desired, yes it is critical. > > 2) The interruptions can come from different sources. > > Time > 0 rt-app executing instruction 1 > 1 rt-app executing instruction 2 > 2 scheduler switches between rt-app and kworker > 3 kworker runs vmstat_work > 4 scheduler switches between kworker and rt-app > 5 rt-app executing instruction 3 > 6 ipi to handle a KVM request IPI > 7 fill in your preferred IPI handler > > So the argument "a single interruption might not cause your deadline > to be exceeded" fails (because the time to handle the > different interruptions might sum). > > Does that make sense? Ping ? (just want to double check the reasoning above makes sense).