Re: Review request for 6977034 Thread.getState() very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 On 12/6/10 5:27 PM, Robert Lougher wrote:
Hi Mandy,

On 6 December 2010 19:26, Mandy Chung<mandy.chung@xxxxxxxxxx>  wrote:
Remi, Eamonn, Brian, David, Doug,

Thanks for the feedback.

I don't know if you welcome external feedback, but I'd like to point
out (if you're not already aware) that this change modifies the VM
interface.

Any feedback is welcome. That's the intent to post this for the openjdk discussion. Thanks for the information that this change will break CACAO of OpenJDK.
While I'm cognisant of the reason for the change, my understanding of
the existing map mechanism is that it makes the JDK class library
independent of the underlying VM thread status values.  The value of
Thread.threadStatus is opaque, with the mapping from VM thread status
being determined by the following VM interface functions (see
hotspot/src/share/vm/prims/jvm.h):

------------------
/*
  * Returns an array of the threadStatus values representing the
  * given Java thread state.  Returns NULL if the VM version is
  * incompatible with the JDK or doesn't support the given
  * Java thread state.
  */
JNIEXPORT jintArray JNICALL
JVM_GetThreadStateValues(JNIEnv* env, jint javaThreadState);

/*
  * Returns an array of the substate names representing the
  * given Java thread state.  Returns NULL if the VM version is
  * incompatible with the JDK or the VM doesn't support
  * the given Java thread state.
  * values must be the jintArray returned from JVM_GetThreadStateValues
  * and javaThreadState.
  */
JNIEXPORT jobjectArray JNICALL
JVM_GetThreadStateNames(JNIEnv* env, jint javaThreadState, jintArray values);

------------------

These two functions are used by the native method
sun.misc.VM.getThreadStateValues to setup the arrays which are then
used to initialise the map.

This change breaks this abstraction, and requires Thread.threadStatus
to be a JVM TI thread state (which happens to match Hotspot's internal
thread state).  This change will therefore break any VM which does not
have an internal thread state based on JVM TI.


Indeed this fix will depend on this private interface between VM and JDK to set the threadStatus as JVM TI thread state. The VM implementation of setting Thread.threadStatus hasn't been changed since 1.5 and performance required by the profiling tools was one of the reasons done in that way.

As far as I'm aware, IKVM and CACAO are currently the only other users
of OpenJDK (I'm also nearing completion of a port to JamVM).
Unfortunately, from looking at CACAO I can see that this change will
break it.  It may also break IKVM, but I haven't checked.  I, of
course, can modify the internal thread states of JamVM to be
compatible.

I'm CC'ing CACAO's mailing list and GNU Classpath so that affected
parties can be made aware of this change.

Thanks for doing that.

As an aside, will there be
any later clean-up of the native method implementation and VM
interface?

I don't know of the project status. I suggest you to post this question to the common VM interface openjdk project:
   http://openjdk.java.net/projects/cvmi/
   cvmi-dev@xxxxxxxxxxxxxxxx

Thanks
Mandy
Thanks,
Rob.

On 12/04/10 04:22, Eamonn McManus wrote:

Hi Mandy,

This test:

         if ((threadStatus&  JVMTI_THREAD_STATE_RUNNABLE) == 1) {

is always false, since JVMTI_THREAD_STATE_RUNNABLE is 4. (NetBeans 7.0
helpfully flags this; I'm not sure if earlier versions do.)

Good catch.   This explains why the speed up for RUNNABLE was not as high in
the microbenchmark measurement.  Correcting it shows that Thread.getState()
gets 3.5X speed up on a thread in RUNNABLE state.

But, once corrected, I think you could use this idea further to write a much
simpler and faster method, on these lines:

     public static Thread.State toThreadState(int threadStatus) {
         if ((threadStatus&  JVMTI_THREAD_STATE_RUNNABLE) != 0) {
             return RUNNABLE;
         } else if ((threadStatus&
JVMTI_THREAD_STATE_BLOCKED_ON_MONITOR_ENTER) != 0) {
             return BLOCKED;
         } else if ((threadStatus&  JVMTI_THREAD_STATE_WAITING_WITH_TIMEOUT)
!= 0) {
             return TIMED_WAITING;
         } else if ((threadStatus&  JVMTI_THREAD_STATE_WAITING_INDEFINITELY)
!= 0) {
             return WAITING;
         } else if ((threadStatus&  JVMTI_THREAD_STATE_TERMINATED) != 0) {
             return TERMINATED;
         } else {
             return NEW;
         }
     }

I forgot to mention in the email that I implemented this simpler approach to
compare with the table lookup approach.   There were no significant
difference.  I now rerun with the corrected fix (checking != 0 rather than
== 1) and the table lookup approach is about 2-6% faster than the sequence
of tests approach.

I am also for the simpler approach but I post the table lookup approach as a
proposed fix to get any opinion on the performance aspect with that
approach.

Given that the Fork-Join framework doesn't depend on it, I will go for a
simpler approach (sequence of tests) and further tune its performance when
there is a use case requiring a perf improvement.

New webrev:
    http://cr.openjdk.java.net/~mchung/6977034/webrev.01/

Can you review this version?

Thanks
Mandy

You could tweak the order of the tests based on what might be the relative
frequency of the different states but it probably isn't worth it.

Regards,

Éamonn

On 3/12/10 11:52 PM, Mandy Chung wrote:

Fix for 6977034: Thread.getState() very slow

Webrev at:
    http://cr.openjdk.java.net/~mchung/6977034/webrev.00/

This is an improvement to map a Thread's threadStatus field to
Thread.State.  The VM updates the Thread.threadStatus field directly at
state transition with the value as defined in JVM TI [1].  The
java.lang.Thread.getState() implementation can directly access the
threadStatus value and do a direct lookup from an array of Thread.State.
The threadStatus value is a bit vector and we would have to create an array
of a minimum of 1061 (0x425) elements to do direct mapping.   I took the
approach to use the first highest order bit set to 1 in the masked
threadStatus value as the index to the Thread.State element and only caches
32 elements (could be fewer).  I wrote a micro-benchmark measuring the
Thread.getState of a thread in different state that shows 1.7X to 6X speedup
(see below).  There is possibly some issue with my micro-benchmark that I
didn't observe the 14X speed up as Doug did in his experiment.  However, I'd
like to get this reviewed and pushed to the repository so that anyone can do
more experiment on the performance measurement.

Thanks
Mandy
P.S. The discussion on this thread can be found at [2] [3].

[1]
http://download.java.net/jdk7/docs/platform/jvmti/jvmti.html#GetThreadState
[2]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-July/004567.html
[3]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-August/004721.html


	JDK 7 b120 (in ms)	With fix (in ms)	Speed up
main		46465	        22772			2.04
NEW		50676		29921			1.69
RUNNABLE	42202		14690			2.87
BLOCKED		72773		12296			5.92
WAITING		48811		13041			3.74
TIMED_WAITING	45737		12849			3.56
TERMINATED	40314		16376			2.46








[Index of Archives]     [Linux Kernel]     [Linux Cryptography]     [Fedora]     [Fedora Directory]     [Red Hat Development]

  Powered by Linux