I've been looking into why JikesRVM sucks so badly on the DaCapo xalan benchmark (it's approx 20x slower than the commercial JVMs). One of the characteristics of this benchmark (at least in its current incarnation) is that it does a very large number of 1- and 2-byte file write operations. JikesRVM currently uses the default reference VMChannel implementation, which crosses the Java->native boundary with the call static native int read(int fd, ByteBuffer buf); The C implementation then uses JNI to invoke methods on the ByteBuffer to get its length and I/O start position, obtain a pointer to the backing array, release the array and write the new position back to the buffer. Allocating a direct buffer for buf, and adding the interface method static native int read(int fd, ByteBuffer buf, int len, int pos); and writing the buffer position back in Java halves the execution time of the benchmark. (the 'write' method is implemented identically) This still leaves the issue of arranging for buf to be a direct byte buffer. In my current patch I've modified FileInputStream.java to arrange for a direct buffer to be used, but this doesn't seem very satisfactory. What I would like to do is intercept the ByteBuffer.wrap method and do 'the optimal thing', eg use the user's buffer as a DirectBuffer if it is in a non-moving space, or something less optimal otherwise, and back this up with a static analysis that allocates I/O buffers in a non-moving space if it can. There's stuff I can do in JikesRVM to speed up the native calls, eliminate some of the buffer copies etc but the further back in the call chain I do it (ie in this code) the more scope there is for optimization. So my questions to the list are: - If I was to contribute the above reimplementation of read and write, would it be accepted ? Or should I pursue a JikesRVM-specific approach ? - What is the best way to put a VM-specific hook into ByteBuffer.wrap ? Is there an existing facility I've missed ? Of course adding a BufferedWriter also solves the performance problem for xalan, but the commercial class libraries seem to have optimized this call (adding the buffer results in no measurable speedup in the sun JDK), so I think we can expect unoptimized user code to become more widespread, and there is probably a (albeit probably less dramatic) payoff for other code. Cheers, Robin