On Jan 28, 2007, at 12:15 AM, Robin Garner wrote:
I've been looking into why JikesRVM sucks so badly on the DaCapo xalan
benchmark (it's approx 20x slower than the commercial JVMs).
One of the characteristics of this benchmark (at least in its current
incarnation) is that it does a very large number of 1- and 2-byte file
write operations. JikesRVM currently uses the default reference
VMChannel
implementation, which crosses the Java->native boundary with the call
static native int read(int fd, ByteBuffer buf);
The C implementation then uses JNI to invoke methods on the
ByteBuffer to
get its length and I/O start position, obtain a pointer to the backing
array, release the array and write the new position back to the
buffer.
Allocating a direct buffer for buf, and adding the interface method
static native int read(int fd, ByteBuffer buf, int len, int pos);
and writing the buffer position back in Java halves the execution
time of
the benchmark. (the 'write' method is implemented identically)
Is there much difference in using the same buffer (not necessarily
direct) but passing in the `len' and `pos' values in Java? If a lot
of the slowdown is caused by the JNI calls to figure these values
out, then adding those parameters to the interface method (which IIRC
is private anyway, and not a part of the Java VM layer) then we
should do that.
I guess what's left is getting the buffer's backing array, which is
another few JNI calls. That can be optimized on a VM-by-VM basis, but
may be harder to abstract into an interface. If we split the
`JCL_buffer' stuff in the VMChannel code into its own library, VM
implementors could provide their own version of that library,
optimized for that VM.
This still leaves the issue of arranging for buf to be a direct byte
buffer. In my current patch I've modified FileInputStream.java to
arrange for a direct buffer to be used, but this doesn't seem very
satisfactory. What I would like to do is intercept the
ByteBuffer.wrap
method and do 'the optimal thing', eg use the user's buffer as a
DirectBuffer if it is in a non-moving space, or something less optimal
otherwise, and back this up with a static analysis that allocates I/O
buffers in a non-moving space if it can.
We could add a new VM method (in VMChannels or something) that does
the job of wrap -- and set the reference implementation to the one we
have now.
There's stuff I can do in JikesRVM to speed up the native calls,
eliminate
some of the buffer copies etc but the further back in the call
chain I do
it (ie in this code) the more scope there is for optimization.
So my questions to the list are:
- If I was to contribute the above reimplementation of read and write,
would it be accepted ? Or should I pursue a JikesRVM-specific
approach ?
In general I like the approach of keeping the JNI calls in native
code as few as possible, so I think this would be fine.
- What is the best way to put a VM-specific hook into
ByteBuffer.wrap ?
Is there an existing facility I've missed ?
Like I said above, we'd add a method 'wrap' to some VM class, then
change ByteBuffer.wrap so it just calls the same method of the VM
class. Maybe adding a new VM class 'java.nio.VMBuffers' would be
appropriate.
Of course adding a BufferedWriter also solves the performance
problem for
xalan, but the commercial class libraries seem to have optimized
this call
(adding the buffer results in no measurable speedup in the sun
JDK), so I
think we can expect unoptimized user code to become more
widespread, and
there is probably a (albeit probably less dramatic) payoff for
other code.
Another approach would be to just make File*putStream do their own
buffering, though I suppose that approach may have drawbacks in some
situations.
Thanks.