Apache Hadoop HBase plays nice with JPA - No Fluff Just Stuff

Apache Hadoop HBase plays nice with JPA

Posted by: Matthias Wessendorf on March 17, 2010

The Google AppEngine uses the Google BigTable implementation as it’s storage system. Instead of only offering a native (and not so common) API to persist data they worked with the Datanucleus guys to get support for JPA and JDO. There are some restrictions for the usage, but generally folks that are aware of these APIs (especially JPA has a pretty adoption rate) can store their data on BIG_TABLE easily. Thanks to these (common) APIs…

Now, not everybody does want to host their application (and store the data) at Google. A decent alternative is using a “home-grown” system. The Big-Table implementation of Apache HadoopHBase”, can be used the same way! It is possible to easily use JDO/JPA (via Datanucleus) to persist objects in the HBase BigTable implementation. This is really good news!

The installation of HBase is not too complicated. All the gotchas are documented in its documentation. So to start using HBase and JPA, just use a regular persistence.xml file, which lists your classes and the actual configuration, such as:

<persistence...>
 <persistence-unit...>

 <class>net.wessendorf...</class>
...

 <properties>
 <property name="datanucleus.ConnectionURL" value="hbase"/>
 <property name="datanucleus.ConnectionUserName" value=""/>
 <property name="datanucleus.ConnectionPassword" value=""/>
 <property name="datanucleus.autoCreateSchema" value="true"/>
 <property name="datanucleus.validateTables" value="false"/>
 <property name="datanucleus.Optimistic" value="false"/>
 <property name="datanucleus.validateConstraints" value="false"/>
 </properties>

 </persistence-unit>
</persistence>

Your Entities are almost “normal”, but there are some restrictions are as well (like managing the @Id)… But generally, just annotated your class with @Entity and deal with the limitations. Once the data-model is done, you could (naivly) start using the EntityManager like:

EntityManagerFactory emf = Persistence.createEntityManagerFactory(...);
EntityManager entityManager = emf.createEntityManager();
EntityTransaction entityTransaction = entityManager.getTransaction();
entityTransaction.begin();

entityManager.persist(myJPAentity);

entityTransaction.commit();

But the best way (as generally with JPA) is to move that JPA-dealing code into a DataAccessObject… But this is not new and well-known…

During your (maven) build you have to do byte-code enhancing of the actual classes. The DataNucleus guys offer a decent maven-plugin for that:

<plugin>
 <groupId>org.datanucleus</groupId>
 <artifactId>maven-datanucleus-plugin</artifactId>
 <version>2.0.0-release</version>
 <configuration>
 <log4jConfiguration>${basedir}/log4j.properties</log4jConfiguration>
 <verbose>true</verbose>
 <api>JPA</api>
 <persistenceUnitName>nameOfyourPU</persistenceUnitName>
 </configuration>
 <executions>
 <execution>
 <phase>compile</phase>
 <goals>
 <goal>enhance</goal>
 </goals>
 </execution>
 </executions>
</plugin>

Now you should be good to go. I wrote a JSF/MyFaces application that uses the DataNucleus JPA-API to store and read objects from Apache HBase. I will make the code available soon… But the above snippet give you an idea on how to configure the stuff, if you are interested in using JPA (or JDO) with the Apache Hadoop HBase project.

Generally the combination for these two is pretty interesting, especially when doing hosting for “regular” JavaEE applications, which mostly will use JPA for its storage API, nowadays. So basically the integration of this and a “normal” JavaEE application is not too complicated. There is also the Spring Framework. Currently there is no explicit support for the DataNucleus JPA/JDO, but I saw some blog talking about Spring and the AppEngine. So integration there is possible too… Perhaps the mentioned “issue” get’s fixed soon as well ;-)

Note: It is possible to use the “native” HBase-API to read/store data to a JPA/JDO managed HBase “table”, but you need some code that is not so straightforward on the first look…, as the datanucleus plugin/JPA-impl uses some class-based metadata to manage the table, the column-familiy and its qualifiers. I have some sample code for that as well. With some love of _reflection_ you can get it done in a generic way (I will post an example soon).


Matthias  Wessendorf

About Matthias Wessendorf

Matthias Wessendorf is a software developer at Oracle. He currently works on ADF Faces, which is an Ajax-based JSF component suite. Matthias also contributes to the OpenSource community, mainly Apache MyFaces and Apache Trinidad. Before joining Oracle, he worked as a CMS-Developer at pironet, where he was building a next-generation CMS, using UI technologies like XUL and Ajax.

Why Attend the NFJS Tour?

  • » Cutting-Edge Technologies
  • » Agile Practices
  • » Peer Exchange

Current Topics:

  • Languages on the JVM: Scala, Groovy, Clojure
  • Enterprise Java
  • Core Java, Java 8
  • Agility
  • Testing: Geb, Spock, Easyb
  • REST
  • NoSQL: MongoDB, Cassandra
  • Hadoop
  • Spring 4
  • Cloud
  • Automation Tools: Gradle, Git, Jenkins, Sonar
  • HTML5, CSS3, AngularJS, jQuery, Usability
  • Mobile Apps - iPhone and Android
  • More...
Learn More »