Re: [RFC -v3 PATCH 2/3] sched: add yield_to function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/05/2011 11:30 AM, KOSAKI Motohiro wrote:
>  On 01/05/2011 10:40 AM, KOSAKI Motohiro wrote:
>  >  >   On 01/05/2011 04:39 AM, KOSAKI Motohiro wrote:
>  >  >   >   >    On 01/04/2011 08:14 AM, KOSAKI Motohiro wrote:
>  >  >   >   >    >    Also, If pthread_cond_signal() call sys_yield_to imlicitly, we can
>  >  >   >   >    >    avoid almost Nehalem (and other P2P cache arch) lock unfairness
>  >  >   >   >    >    problem. (probaby creating pthread_condattr_setautoyield_np or similar
>  >  >   >   >    >    knob is good one)
>  >  >   >   >
>  >  >   >   >    Often, the thread calling pthread_cond_signal() wants to continue
>  >  >   >   >    executing, not yield.
>  >  >   >
>  >  >   >   Then, it doesn't work.
>  >  >   >
>  >  >   >   After calling pthread_cond_signal(), T1 which cond_signal caller and T2
>  >  >   >   which waked start to GIL grab race. But usually T1 is always win because
>  >  >   >   lock variable is in T1's cpu cache. Why kernel and userland have so much
>  >  >   >   different result? One of a reason is glibc doesn't have any ticket lock scheme.
>  >  >   >
>  >  >   >   If you are interesting GIL mess and issue, please feel free to ask more.
>  >  >
>  >  >   I suggest looking into an explicit round-robin scheme, where each thread
>  >  >   adds itself to a queue and an unlock wakes up the first waiter.
>  >
>  >  I'm sure you haven't try your scheme. but I did. It's slow.
>
>  Won't anything with a heavily contented global/giant lock be slow?
>  What's the average lock hold time per thread? 10%? 50%? 90%?

Well, Of cource all of heavily contetion are slow. but we don't have to
compare heavily contended with light contended. we have to compare
heavily contended with heavily contended or light contended with light
contended. If we are talking a scripting language VM, pipe benchmark
show impressively FIFO overhead which like your propsed. Because
pipe bench makes frequently GIL grab/ungrab storm. Similar to pipe
bench showed our (very) old kernel's bottleneck. Sadly userspace have
no way to implement per-cpu runqueue. I think.

A completely fair lock will likely be slower than an unfair lock.

And, if we are talking a language VM, I can't say any average time. It
depend on running script.

Pick some parallel compute intensive script, please.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux