On 1/24/20 1:40 PM, Waiman Long wrote: > On 1/24/20 1:19 PM, Alex Kogan wrote: >> >> >>> On Jan 24, 2020, at 11:46 AM, Waiman Long <longman@xxxxxxxxxx >>> <mailto:longman@xxxxxxxxxx>> wrote: >>> >>> On 1/24/20 11:29 AM, Alex Kogan wrote: >>>> >>>> >>>>> On Jan 24, 2020, at 10:19 AM, Waiman Long <longman@xxxxxxxxxx >>>>> <mailto:longman@xxxxxxxxxx>> wrote: >>>>> >>>>> On 1/24/20 9:42 AM, Waiman Long wrote: >>>>>> On 1/24/20 2:52 AM, Peter Zijlstra wrote: >>>>>>> On Thu, Jan 23, 2020 at 04:33:54PM -0500, Alex Kogan wrote: >>>>>>>> Let me put this question to you. What do you think the number >>>>>>>> should be? >>>>>>> I think it would be very good to keep the inter-node latency >>>>>>> below 1ms. >>>>>> It is hard to guarantee that given that lock hold times can vary >>>>>> quite a >>>>>> lot depending on the workload. What we can control is just how many >>>>>> later lock waiters can jump ahead before a given waiter. >>>> I totally agree. I do not think you can guarantee that latency even >>>> today. >>>> With the existing spinlock, you join the queue and wait for as long >>>> as it takes >>>> for each and every thread in front of you to execute its critical >>>> section. >>>> >>>>>>> But to realize that we need data on the lock hold times. >>>>>>> Specifically >>>>>>> for the heavily contended locks that make CNA worth it in the first >>>>>>> place. >>>>>>> >>>>>>> I don't see that data, so I don't see how we can argue about >>>>>>> this let >>>>>>> alone call something reasonable. >>>>>>> >>>>>> In essence, CNA lock is for improving throughput on NUMA machines >>>>>> at the >>>>>> expense of increasing worst case latency. If low latency is >>>>>> important, >>>>>> it should be disabled. If CONFIG_PREEMPT_RT is on, >>>>>> CONFIG_NUMA_AWARE_SPINLOCKS should be off. >>>>> >>>>> Actually, what we are worrying about is the additional latency >>>>> that can >>>>> be added to important tasks or execution contexts that are waiting >>>>> for a >>>>> lock. Maybe we can make CNA lock behaves somewhat like qrwlock is that >>>>> requests from interrupt context are giving priority. We could add a >>>>> priority flag in the CNA node. If the flag is set, we will never >>>>> put it >>>>> into the secondary queue. In fact, we can transfer control next to it >>>>> even if it is not on the same node. We may also set the priority >>>>> flag if >>>>> it is a RT task that is trying to acquire the lock. >>>> I think this is possible, and in fact, we have been thinking along >>>> those lines >>>> about ways to better support RT tasks with CNA. However, this will >>>> _probably >>>> require changes to API and will _certainly complicate the code >>>> quite a bit. >>> >>> What you need to do is to modify cna_init_node() to check the >>> current locking context and set the priority flag accordingly. >>> >> Is there a lightweight way to identify such a “prioritized” thread? > > You can use the in_task() macro in include/linux/preempt.h. This is > just a percpu preempt_count read and test. If in_task() is false, it > is in a {soft|hard}irq or nmi context. If it is true, you can check > the rt_task() macro to see if it is an RT task. That will access to > the current task structure. So it may cost a little bit more if you > want to handle the RT task the same way. > We may not need to do that for softIRQ context. If that is the case, you can use in_irq() which checks for hardirq and nmi only. Peter, what is your thought on that? Cheers, Longman