JPA Under The Hood – Understanding the Dynamics of Your JPA Framework
I recently gave a talks on the behaviour of different JPA frameworks at W-JAX(Germany) and TheServerSide Java Symposium (Prague). As some people have asked me, I am publishing the samples as well. I would also give away the eclipse project, however with all the third party libraries I am sure I will end up not doing it legally correct. Additionally I can add some comments on the samples and why they are as they are
.
The goal of my experiment was to compare different JPA frameworks regarding their runtime characteristics. I addressed the following points:
- Object Loading
- Object Creation
- Update Behaviour
- Caching
- Connection Handling
Preparation – SQL Scripts, Entity Classes and Persistence Unit Definitions
First start with the SQL scripts for creating the necessary tables. I use two tables – user and accounts. A user can have multiple accounts.
CREATE TABLE users (
`username` VARCHAR(15) NOT NULL,
`password` VARCHAR(15) NOT NULL,
`firstname` VARCHAR(30) NOT NULL,
`lastname` VARCHAR(30) NOT NULL,
`street` VARCHAR(30) NOT NULL,
`town` VARCHAR(15) NOT NULL,
`zip` VARCHAR(10) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE accounts (
`id` INT(10) NOT NULL AUTO_INCREMENT,
`IBAN` VARCHAR(34) NOT NULL,
`BIC` VARCHAR(11) NOT NULL,
`userID` INT(10) NOT NULL,
`amount` DECIMAL(16,2) NOT NULL
PRIMARY KEY (id),
FOREIGN KEY (`userId`) REFERENCES `Users` (`id`),
)
Next we need to define the persistence classes. We define an User class and an Account class. Getter and setter methods are omitted for brevty here.
@Entity
@Table(name="users")
// @Cache(usage=CacheConcurrencyStrategy.READ_WRITE)
public class User {
private long id;
private String firstName;
private String lastName;
private String userName;
private String password;
private String street;
private String town;
private String zip; >
private List<Account> accounts;
@Id
public long getId() {
return id;
}
@OneToMany(mappedBy="user")
public List<Account> getAccounts (){
return accounts;
}
}
@Entity
@Table(name="accounts")
public class Account {
private long id;
private User user;
private String BIC;
@Id
public long getId() {
return id;
}
@ManyToOne
@JoinColumn(name="userID")
public User getUser(){
return user;
}
}
So far no rocket science. In the next step we define the persistence units. I defined a single unit per persistence provider. According to the JPA spec this should work fine. However some strange things might happen
<persistence xmlns="http://java.sun.com/xml/ns/persistence" version="1.0">
<persistence-unit name="netPayEclipse" transaction-type="RESOURCE_LOCAL"
xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" >
<provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
<!-- Entities -->
<class>com.dynatrace.talks.jpahood.entity.User</class>
<class>com.dynatrace.talks.jpahood.entity.Transaction</class>
<class>com.dynatrace.talks.jpahood.entity.Account</class>
<properties>
<property name="eclipselink.jdbc.user" value="root"/>
<property name="eclipselink.jdbc.password" value="admin" />
<property name="eclipselink.jdbc.driver" value="com.mysql.jdbc.Driver"/>
<property name="eclipselink.jdbc.url" value="jdbc:mysql://localhost/netpay"/>
<property name="eclipselink.target-database" value="MySQL4" />
<!-- <property name="eclipselink.cache.shared.default" value="false"/> -->
<property name="eclipselink.jdbc.read-connections.min" value="1" />
<property name="eclipselink.jdbc.read-connections.max" value="1" />
<property name="eclipselink.jdbc.write-connections.min" value="1" />
<property name="eclipselink.jdbc.write-connections.max" value="1" />
</properties>
</persistence-unit>
<persistence-unit name="netPayOpenJPA" transaction-type="RESOURCE_LOCAL"
xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" >
<provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider>
<!-- Entities -->
<class>com.dynatrace.talks.jpahood.entity.User</class>
<class>com.dynatrace.talks.jpahood.entity.Transaction</class>
<class>com.dynatrace.talks.jpahood.entity.Account</class>
<properties>
<property name="openjpa.ConnectionProperties"
value="DriverClassName=com.mysql.jdbc.Driver,
Url=jdbc:mysql://localhost/netpay,
MaxActive=1000,
MaxWait=10000,
TestOnBorrow=false,
Username=root,
Password=admin"/>
<property name="openjpa.ConnectionDriverName"
value="org.apache.commons.dbcp.BasicDataSource"/>
<!--
<property name="openjpa.DataCache" value="true"/>
<property name="openjpa.RemoteCommitProvider" value="sjvm"/>
-->
<property name="openjpa.QueryCache"
value="CacheSize=1000, SoftReferenceSize=100"/>
</properties>
</persistence-unit>
<persistence-unit name="netPayHib" transaction-type="RESOURCE_LOCAL"
xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" >
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<!-- Entities -->
<class>com.dynatrace.talks.jpahood.entity.User</class>
<class>com.dynatrace.talks.jpahood.entity.Transaction</class>
<class>com.dynatrace.talks.jpahood.entity.Account</class>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>
<property name="hibernate.connection.driver_class" value="com.mysql.jdbc.Driver"/>
<property name="hibernate.connection.username" value="root"/>
<property name="hibernate.connection.password" value="admin"/>
<property name="hibernate.connection.url" value="jdbc:mysql://localhost/netpay"/>
<property name="hibernate.max_fetch_depth" value="3"/>
<property name="hibernate.connection.pool_size" value="500"/>
<property name="hibernate.ejb.cfgfile"
value="/com/dynatrace/talks/jpahood/hibernate.cfg.xml"/>
</properties>
</persistence-unit>
</persistence>
That’s it for preparation now we are ready to look at the samples, which will help us to understand the inner workings of JPA frameworks.
Dynamic Behaviour of JPA Frameworks
Now let us go through the various samples. The samples are deliberately kept very simple. However they show typical usage scenarios
Sample 1- It depends on what you make out of it
The goal of this sample is to test whether a framework detects parameters in query strings and automatically creates proper prepared statemenets. Here is the sample for querying the user with id 1.
public static void simpleLoadSample() {
EntityManager em = EntityManagerUtil.getEMFactory(provider).createEntityManager();
Query query = em.createQuery("select u from User u where u.id=1");
iterateOverItems(query.getResultList());
em.close();
}
Actuall a JPA frameworks should produce the same SQL statement as for the code below.
public static void simpleLoadwithParameter() {
EntityManager em = EntityManagerUtil.getEMFactory(provider).createEntityManager();
Query query = em.createQuery("select u from User u where u.id=?");
query.setParameter(1, 1L);
iterateOverItems(query.getResultList());
em.close();
}
In my tests both -OpenJPA and ExclipseLink – create proper prepared statements in both cases. However Hibernate in the first case creates a statement that looks like this “select … from user where id=1″ and also prepares this statement. Prepared statements like this can have render PreparedStatement caching as well as database query caching obsolete.
Sample 2 – The Magic Value
This sample deals with object construction. What I have seen in my presentation a lot of people are not sure what is actually happening here. We are loading an object with are query. While waiting for input, we modify the value in the database and then we query the value again.
public static void loadTwiceWithQuery (){
EntityManager em = EntityManagerUtil.getEMFactory(provider).createEntityManager();
Query query = em.createQuery("select u from User u where u.id=1");
iterateOverItems(query.getResultList());
em.close();
try {
System.in.read();
// change value in database
} catch (IOException e) {
e.printStackTrace();
}
em = EntityManagerUtil.getEMFactory(provider).createEntityManager();
query = em.createQuery("select u from User u where u.id=1");
iterateOverItems(query.getResultList());
em.close();
}
When trying this example with different JPA frameworks you will see that two database queries will be executed unless query caching is enabled. However the second query will return the object with the “old” values. Why that? The query is used to retreive the id of the user. As it realized that the object has already been loaded it will not construct that object again. In case you always want the latest state, you would have to use refresh().
Sample 3 – Staying up to date
In this sample we look at updating. We load again a user update and then update the first name in a very creative way
.
public static void simpleUpdate (){
EntityManager em1= EntityManagerUtil.getEMFactory(provider).createEntityManager();
em1.getTransaction().begin();
User user = em1.find(User.class, 1L);
user.setFirstName("otherFirstName" + System.currentTimeMillis());
em1.getTransaction().commit();
em1.close ();
}
Guess what happens … the object get’s updated
. Well, that is what you would expect. The interesting part here is again, what the statement looks like. Actually we only want the firstame column to be updated. EclipseLink and OpenJPA do so be default. Hibernate however will update all fields. In case you have defined trigger in the database this can cause serious performance problems as triggers or stored procedures might be invoked although they should not. As Garvin mentioned in his comment below using specific queries for each different update will result in a much higher number of total queries which can lead to proplems with the JDBC PreparedStatement cache.
Sample 4 – Having good references
This sample deal with the getReference method of the EntityManager. The JavaDoc says:
Get an instance, whose state may be lazily fetched. … The application should not expect that the instance state will be available upon detachment, unless it was accessed by the application while the entity manager was open.
Hmmm, I do not know how you feel about this, but the word may confused me here a bit. Actually this means I do not know whether the object will be fetched or not. I used the following code sample to see what’s happening
public static void getReferenceSample (){
EntityManager em= EntityManagerUtil.getEMFactory(provider).createEntityManager();
em.getReference(User.class, 1L);
em.close ();
}
Here my experiments show that eclipseLink loads the data will Hibernate and OpenJPA do not load the data.
Sample 5 – Staying in good relations
In the next sample we look at the behaviour for loading detail-master relationships. Hey, that is master detail not the other way round. Yes, I know but here we first load the detail and then the master.
public static void loadRelationSample () {
EntityManager em= EntityManagerUtil.getEMFactory(provider).createEntityManager();
Query query = em.createQuery("select acc from Account acc where acc.id = 1");
Account account = (Account) query.getSingleResult();
User user = account.getUser();
em.close ();
}
Very interestingly all frameworks I used by default load the master record as well. How they actually do this depends on the framework as well as the database used. OpenJPA for example by default uses a join, eclipseLink does not and when using Hibernate it depends on the used dialect (and database).
Sample 6 – Yam Session
In this sample we look at connection handling and sessions. The first example creates more and more EntityManager and queries for an object. The second sample sample does the same, however it also uses transactions. … and what is the ArrayList for? Well, we want to avoid Garbage Collection and automatic closing of the EntityManager
public static void checkMaxSessions() {
ArrayList<EntityManager> myEMs = new ArrayList<EntityManager>();
for (int i = 1; i < 51; ++i) {
try {
EntityManager em = EntityManagerUtil.getEMFactory(provider)
.createEntityManager();
myEMs.add(em);
User u = (User) em.find(User.class, new Long(i));
u.getFirstName();
System.out.println("Concurrent sessions: " + i);
} catch (Exception ex) {
System.err.println(ex);
break;
}
try {
Thread.sleep(700);
} catch (InterruptedException e) {
}
}
}
public static void checkMaxSessionsWithTransaction() {
ArrayList<EntityManager> myEMs = new ArrayList<EntityManager>();
for (int i = 1; i < 51; ++i) {
try {
EntityManager em = EntityManagerUtil.getEMFactory(provider)
.createEntityManager();
myEMs.add(em);
em.getTransaction().begin();
User u = (User) em.find(User.class, new Long(i));
u.getFirstName();
System.out.println("Concurrent sessions: " + (i));
} catch (Exception ex) {
System.err.println(ex);
break;
}
try {
Thread.sleep(300);
} catch (InterruptedException e) {
}
}
}
What we can see here that when using no transactions, we can do all the work with one connection. When we use transactions however Hibernate will open a new connection per EntityManager. So, if you do not need transactions – when you just load a single list on a website for example – you are better off not using them. However you should be aware of the implications of not using transactions across mulitple queries (which I assume).
Conclusion
Although JPA is standardizing the interface for persistence frameworks there is still a lot of freedom regarding runtime behaviour. This can easily impact the performance of your application. It also shows that you should not rely on the default settings of framework. In case you need consistent behaviour across JPA providers, you have to thest the runtime behaviour and tweek it to your needs. Ideally you write something like Reset CSS for JPA.
Further Readings
Below you find a number of links to other persistence related posts. Specifically to caching in Hibernate. Additionally I recommend checking out the database diagnosis section of dynaTrace.
Thank you everybody for the feeback!
Related posts:
- Understanding Caching in Hibernate – Part Three : The Second Level Cache In the last posts I already covered the session cache...
- Understanding Caching in Hibernate – Part Two : The Query Cache In the last post I wrote on caching in Hibernate...
- Understanding Caching in Hibernate – Part One : The Session Cache Hibernate offers caching functionality which is designed to reduces...
- JPA Frameworks under the Hood @ TheServerSide Prague [caption id="attachment_981" align="alignleft" width="125" caption="TSS Prague I'm speaking"][/caption] For all...
- ADO.NET Entity Framework: unexpected behaviour with MergeOptions I’ve started to look closer at the ADO.NET Entity Framework...























One of several annoyances in the JPA spec is the lack of defined behaviour in various situations, hence you don’t know what to expect. Another is what happens when you access a field that wasn’t loaded upon detachment … spec says nothing.
DataNucleus is another JPA implementation FWIW, also implementing JDO, and not just for RDBMS. PS JDO standardised the persistence process many years ago.
can you post a link to your presentation pdf/ppt?
thanks
can you post a link to your original presentation, i.e. the on which this blog post elaborates?
Social comments and analytics for this post…
This post was mentioned on Twitter by MSPBIZ: W-JAX JPA Under The Hood Performance, Scalability and Architecture …: Most Recent Posts… http://bit.ly/AKQMo
http://bit.ly/10NQjL…
“However Hibernate creates a statement that looks like this “select … from user where id=1″ and also prepares this statement.”
This is *definitely* not correct. Hibernate *always* uses JDBC parameters for JPA query parameters. You have made a mistake somewhere.
“Hibernate however will update all fields.”
This is a default behavior that is actually faster in many cases, due to more efficient use of the prepared statement cache. (Yes, we actually studied this.) You can of course enable the use of per-field update. It’s just not the default behavior.
Gavin,
it is true that Hibernate users parameters for JPA query paramters. However there are not used for queries like “select u from User u where u.id=1″. Other frameworks use parameters here as well. One of our customers even run into a memory leak because of this. However this was no Hibernate fault.
Regarding the update behaviour. I also principle agree with you that this optimizes the prepared statement cache usage.
The goal of this post presentation is to show the different default behaviour of frameworks as many people are not aware of this. It is also not ment against Hibernate. I like the work you are doing
billmil,
the actual presentation mainly consists of live samples fo the code snippes and their dynamic analysis, so not mcuh to post. What are you specifically interested in
It would maybe be nice if you could link to a page which shows the exact queries that the different JPA providers generated.
Also, in similar tests that we did, we found it extremely important to mention the exact versions being used. It’s quite hard to reproduce anything a year later if you have forgotten which versions were used.
It might also be nice to test more complex queries and test these with different versions of the same JPA provider (e.g. Hibernate). Does Hibernate 3.4 generates more efficient queries than Hibernate 3.0, or is there no progress in this department? That would be a very interesting question too.
All the release notes of the various JPA providers mostly focus on bugs being fixed or features being added, but is there any focus on creating more efficient SQL?
Henk,
I was using Hibernate 3.3.2, OpenJPA 1.2.1 and eclipseLink 1.1.2. I will elaborate the article to show the exact traces and queries.
I also agree that you have to re-run it for different versions. However as I said my goal was to create more awareness that there are differences.
Good input
[...] JPA Under The Hood – Understanding the Dynamics of Your JPA Framework Performance, Scalability… – I recently gave a talks on the behaviour of different JPA frameworks at W-JAX(Germany) and TheServerSide Java Symposium (Prague). The goal of my experiment was to compare different JPA frameworks regarding their runtime characteristics [...]
[...] This post was Twitted by MartinAhrer [...]
Updated the post based on some of the comments (text in italic)
General Dynamics Information Technology is a top-tier IT integrator that provides information technology, systems engineering and professional services to customers in the defense, intelligence, homeland security, federal civil and commercial sectors. With 16,000 professionals worldwide, the company has the customer knowledge, domain expertise and proven performance to manage large-scale, mission-critical IT programs.
I guess Hibernate does update all fields because they have problems with dirty tracking.
This is not the case for all frameworks.