Code for this blog can be found on github at
https://github.com/galapagosfinch

Saturday, September 17, 2011

UUID and Spring Data JPA

OK, I had a break.  After my tangent, I got a new job.  It has been very busy at work, which has kept me from writing, and I have moved from working in Grails to working directly in a lot of different Spring, including Spring Security, Spring Integration, and something new for me, Spring Data JPA.

The Spring Data project makes it easier to integrate Spring projects with new data technologies.  I have always heard it coupled with technologies like redis, Hadoop, and MongoDB.  Each of the subprojects is fairly individual from each other, because each of these data technologies has a different way of doing business, but they bring standard Spring-isms like dependency injection and Template-based access.  They also have a subproject to make JPA easier to use.  Just about everyone knows what JPA is- the Sun-standardized way to perform Object Relational Mapping.  Our project is using JPA to do the mappings and Hibernate as the Entity Provider.



Now, about UUIDs.  I had worked with UUIDs as primary keys before, when I was a short-term contractor a while back.  It seemed like an odd choice - a 36-byte string rather than a 4- or 8-byte integer number.  Seems like a lot of overhead, if you ask me.  However, when I asked why, they weren't able to answer the question or to tell me who made the choice.  So, I let it drop (I had more urgent things to do).  When i joined Altisource, lo and behold, all of the PKs in the proposed data model were CHAR(36).  So, once again, I asked why.  And for the first time, someone answered the question intelligently and won me over.

UUIDs have the benefit of being "Universally Unique".  That means that there should never be a collision between tables.  A UUID in your USER table is distinct from a UUID in your ORG table.  Even more importantly, a UUID for a USER in your production environment will not collide with the UUID for a USER in your production support environment.  Importing your production database into the support environment is a simply import - no need to even drop existing values unless it is a performance problem (in which case, you don't want to change the data distribution).  Think of how much trouble it is moving a database and getting the sequences back in order.  Depending on the database, this can be a number of scripts within itself.  With UUIDs, there is no need to maintain long-term state in order to generate value - everything is gathered from information available at the time of creation (time, host, a sequence since startup, etc.).

UUIDs do have the overhead of being big strings using only 17 characters (hex digits plus dash).  This means the index will have to work a bit harder to get its job done.  However, we decided that all of the other benefits sincerely outweigh this cost, so our data model (which in a very short while has already grown to 120 tables) uses them throughout.

Now, to the code.  Spring Data JPA provides a Persistable<ID> interface to make it easier for Spring to interact with persistable entities.  We began using this when we adobted the Auditable<U, ID> interface, which I will talk about later.  Spring Data also provides a base class to help you get started, AbstractPersistable.  The base class provides automatic ID generation, equals, hashcodes - the usual suspects.  But automatic UUID generation is not supported directly by the JPA standard.  So, we had to look elsewhere.

Since we are using Hibernate as our Entity Provider, we decided to leverage its built-in support for UUIDs.  This required us to fork off our own copy of AbstractPersistable and place it in our framework.  No harm, no foul - the Spring folks are smart enough to program to the interface, not to the base class.  Here's what we did (I snipped out the internal Javadoc to condense the code:

import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.MappedSuperclass;

import org.hibernate.annotations.GenericGenerator;
import org.springframework.data.domain.Persistable;

/**
 * Abstract base class for entities. Provides a String ID that will contain a
 * UUID, leverages Hibernate's auto-generation for UUIDs, and
 * implements {@link #equals(Object)} and {@link #hashCode()} based on that id.
 * 
 * Inspired by Spring Data JPA's AbstractPersistable implementation.
 * 
 * @author Steve Finch
 */
@MappedSuperclass
public abstract class AbstractPersistable implements Persistable<String> {

	private static final long serialVersionUID = 2535090450811888936L;

	@Id
	@GeneratedValue(generator = "system-uuid")
	@GenericGenerator(name = "system-uuid", strategy = "uuid")
	private String id;

	public String getId() {
		return id;
	}

	protected void setId(final String id) {
		this.id = id;
	}

	public boolean isNew() {
		return this.id == null;
	}


	@Override
	public String toString() {
		return String.format("Entity of type %s with id: %s", this.getClass().getName(), getId());
	}


	@Override
	public boolean equals(Object obj) {

		if (obj == null) {
			return false;
		}

		if (this == obj) {
			return true;
		}

		if (!getClass().equals(obj.getClass())) {
			return false;
		}

		AbstractPersistable rhs = (AbstractPersistable) obj;
		return this.id == null ? false : this.id.equals(rhs.id);
	}


	@Override
	public int hashCode() {
		int hashCode = 17;
		hashCode += (this.id == null) ? 0 : this.id.hashCode() * 31;
		return hashCode;
	}
}

Lines 22-24 are the important ones.  Instead of the stock GenerationType.AUTO strategy, we specify a named "system-uuid" generator, which we define on the next line.  The "uuid" strategy is a Hibernate extension which will create a Type 2 UUID on the application server at the time of INSERT.  This is documented in the Generators segment of the Hibernate Annotations documentation.

Well, that's it for now.  Next time I will talk about how we integrated AbstractAuditable with Spring Security to insure our records are properly stamped.

6 comments:

  1. I think one needs to be careful where they adopt UUIDs as PKs, and I would always caution against it. A USER table is a good example, unless you're Facebook with millions of USERs and furthermore a USER table is rarely an island. USER IDs usually are used as FK to other tables that have USER preferences and ROLE information, which would also impact performance across those operations too.

    For instance, if you have a table that logs user activity... "SELECT * FROM LOGTBL WHERE USERID=". In this case, your USER table may have 100 rows, but your log table has a million.

    ReplyDelete
  2. as they're a rapresentation of a 128 bit number, generally uuid can be converted to binary(16) (ie. SELECT REVERSE(UNHEX(REPLACE(UUID(),’-',”))) in mysql) or you can use two 64 bit integer with a composite key(low, high).

    my 2ç

    ReplyDelete
    Replies
    1. Yes, very good point. UUIDs are most compressed when they are in binary form. Unfortunately, they are much harder for us humans to deal with.

      Another form that I was toying with was "compressed UUID", where the binary form is Base64-encoded. This becomes a 22 character string instead of 36 characters. Much more condensed for the same amount of randomness, having a bit density of about 5.8 bits of information per character. And you can still copy/paste the values.

      Delete
  3. Have you guys ran into these problem before?
    http://stackoverflow.com/questions/25255183/jpa-changes-uuid-value
    and
    http://stackoverflow.com/questions/25319880/jpa-not-returning-the-text-value-of-a-mysql-varbinary-field

    ReplyDelete
    Replies
    1. No, we've not hit that problem. It is probably because we use CHAR(36) in the database as well as String in the code. No translation, no problem!

      Do you have a requirement to be binary in the database and stringified in memory? Can one side morph to meet the other? If not, you may need a custom type to manage the data translation.

      Delete