Objectively comparing the performance of a Java program and another equivalent one written in another programming language such as C or C++ is a tricky and controversial task. The target platform of Java's bytecode compiler is the Java platform, and the bytecode is either interpreted or compiled into machine code by the JVM. Very different and hard-to-compare scenarios arise from these two different approaches: static vs. dynamic compilations and recompilations, the availability of precise information about the runtime environment and others.
The performance of the compiled Java program will depend on how smartly its particular tasks are going to be managed by the host JVM, and how well for doing it the JVM could take advantage of the features of the hardware and OS. Thus, any Java performance test or comparison has always to report the version, vendor, OS and hardware architecture of the used JVM. In a similar manner, the performance of the equivalent natively-compiled program will depend on the quality of its generated machine code, so the test or comparison also has to report the name, version and vendor of the used compiler, and its activated optimization directives.
Historically, Java programs' execution speed improved significantly due to the introduction of Just-In Time compilation (in 1997/1998 for Java 1.1), the addition of language features supporting better code analysis, and optimizations in the Java Virtual Machine itself (such as HotSpot becoming the default for Sun's JVM in 2000).
Virtual machine optimization techniques
Many optimizations have improved the performance of the Java Virtual Machine over time. However, although Java was often the first Virtual machine to implement them successfully, they have often been used in other similar platforms as well.
Just-In-Time compilation
Further information: Just-in-time compilation and HotSpotEarly Java Virtual Machine always interpreted bytecodes. This had a huge performance penalty (between a factor 10 and 20 for Java versus C in average applications).
Java 1.1 saw the introduction of a JIT compiler.
Java 1.2 saw the introduction of an optional system called HotSpot : The Virtual Machine continually analyzes the program's performance for "hot spots" which are frequently or repeatedly executed. These are then targeted for optimization, leading to high performance execution with a minimum of overhead for less performance-critical code.
With the introduction of Java 1.3 HotSpot was the default system.
With the HotSpot technique, code is first interpreted, then "hot spots" are compiled on the fly. This is the reason why it is necessary to execute the programs a few times before measuring performances in benchmarks.
The HotSpot-compilation uses many optimization techniques, such as Inline expansion, Loop unwinding, Bounds-checking elimination, or architecture dependent Register allocation.
Some benchmarks show a 10-fold speed gain from this technique.
Adaptive optimization
Further information: Adaptive optimizationAdaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between Just-in-time compilation and interpreting instructions. At another level, adaptive optimization may take advantage of local data conditions to optimize away branches and to use inline expansion to decrease context switching.
A Virtual Machine like HotSpot is also able to deoptimize a previously JITed code. This allows it to perform aggressive (and potentially unsafe) optimizations, while still being able to deoptimize the code and fall back on a safe path later on.
Garbage collection
Further information: Garbage collection (computer science)The 1.0 and 1.1 Virtual Machines used a mark-sweep collector, which could fragment the heap after a garbage collection. Starting with Java 1.2, the Virtual Machines switched to a generational collector, which has a much better defragmentation behaviour. Modern Virtual Machines use a variety of techniques that have further improved the garbage collection performance.
Other optimization techniques
Split bytecode verification
Prior to executing a class, the Sun JVM verifies its bytecodes (see Bytecode verifier). This verification is performed lazily: classes bytecodes are only loaded and verified when the specific class is loaded and prepared for use, and not at the beginning of the program. (Note that other verifiers, such as the Java/400 verifier for IBM System i, can perform most verification in advance and cache verification information from one use of a class to the next.) However, as the Java Class libraries are also regular Java classes, they must also be loaded when they are used, which means that the start-up time of a Java program is often longer than for C++ programs, for example.
A technique named Split-time verification , first introduced in the J2ME of the Java platform, is used in the Java Virtual Machine since the Java version 6. It splits the verification of bytecode in two phases:
- Design-time - during the compilation of the class from source to bytecode
- runtime - when loading the class.
In practice this technique works by capturing knowledge that the Java compiler has of class flow and annotating the compiled method bytecodes with a synopsis of the class flow information. This does not make runtime verification appreciably less complex, but does allow some shortcuts.
Escape analysis and lock coarsening
Further information: Lock (computer science) and Escape analysisJava is able to manage multithreading at the language level. Multithreading is a technique that allows one to
- improve a user's perceived impression about program speed, by allowing user actions while the program performs tasks, and
- take advantage of multi-core architectures, enabling two unrelated tasks to be performed at the same time by two different cores.
However, programs that use multithreading need to take extra care of objects shared between threads, locking access to shared methods or blocks when they are used by one of the threads. Locking a block or an object is a time-consuming operation due to the nature of the underlying operating system-level operation involved (see concurrency control and lock granularity).
As the Java library does not know which methods will be used by more than one thread, the standard library always locks blocks when necessary in a multithreaded environment.
Prior to Java 6, the virtual machine always locked objects and blocks when asked to by the program (see Lock Implementation), even if there was no risk of an object being modified by two different threads at the same time. For example, in this case, a local
Vectorwas locked before each of the add operations to ensure that it would not be modified by other threads (Vector is synchronized), but because it is strictly local to the method this is not necessary:public String getNames ( ) { Vector v = new Vector ( ) ; v. add ( "Me" ) ; v. add ( "You" ) ; v. add ( "Her" ) ; return v. toString ( ) ; }Starting with Java 6, code blocks and objects are locked only when necessary , so in the above case, the virtual machine would not lock the Vector object at all.
As of version 6u14, Java includes experimental support for escape analysis.
Register allocation improvements
Prior to Java 6, allocation of registers was very primitive in the "client" virtual machine (they did not live across blocks), which was a problem in architectures which did not have a lot of registers available, such as x86 for example. If there are no more registers available for an operation, the compiler must copy from register to memory (or memory to register), which takes time (registers are typically much faster to access). However the "server" virtual machine used a color-graph allocator and did not suffer from this problem.
An optimization of register allocation was introduced in Sun's JDK 6; it was then possible to use the same registers across blocks (when applicable), reducing accesses to the memory. This led to a reported performance gain of approximately 60% in some benchmarks.
Class data sharing
Class data sharing (called CDS by Sun) is a mechanism which reduces the startup time for Java applications, and also reduces memory footprint. When the JRE is installed, the installer loads a set of classes from the system jar file (the jar file containing all the Java class library, called rt.jar) into a private internal representation, and dumps that representation to a file, called a "shared archive". During subsequent JVM invocations, this shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the JVM's Metadata for these classes to be shared among multiple JVM processes.
The corresponding improvement for start-up
Java Performance Training Courses
Our valued sponsors who help make this site possible Webinar: The Four Dimensions of Application Performance Monitoring ManageEngine: Monitor Java EE Transactions, App Servers, JMX ...
Java Performance - Devoxx08 - Devoxx
Java Performance Abstract Performance tuning is a mix of understanding the technology, having a good toolkit of applications, and knowing the methodology. In this session we ...
USENIX ;login: - Java Performance
java performance. by Glen McCluskey. Glen McCluskey is a consultant with 15 years of experience and has focused on programming languages since 1988.
Java SE Performance at a Glance
Performance is critical for the success of applications built on Java technology and impacts all levels of the software stack.
java.net Forums : Performance
General Performance Discussion Discuss performance issues, tips and best practices throughout the stack -- from the JVM to the application server and web services.
Java performance - Wikipedia, the free encyclopedia
Objectively comparing the performance of a Java program and another equivalent one written in another programming language such as C or C++ is a tricky and controversial task.
XX Framework « Infoblazer - Pure Java Performance
Announcing …XX Framework 2.0. The XX framework is a configurable, XML/XSL-centric implementation of the MVC development paradigm. The primary goal of the XX Framework is to ...
Tuning tips by category
Our valued sponsors who help make this site possible Webinar: The Four Dimensions of Application Performance Monitoring ManageEngine: Monitor Java EE Transactions, App Servers, JMX ...
Java Performance Documentation
This page aggregates performance documentation for the Java Platform from operating systems to application servers.
Java Performance (JavaPerformance) on Twitter
Posts about Java performance ... Hey there! JavaPerformance is using Twitter. Twitter is a free service that lets you keep in touch with people through the exchange of quick ...