Hardware Performance
Use distributed computing at the level of coarse grained parallelism
Use low level code (C, assembly, or optimized Fortran) to achieve optimal on-chip performance of key kernels.
Use JNI, Javaís Native Interface to tie low level code into distributed objects.