Sr. Engineer with Terracotta Inc.
Alex Miller is a Sr. Engineer with Terracotta Inc, the makers of the open-source Java clustering product Terracotta. Prior to Terracotta, Alex worked at BEA Systems on the AquaLogic product line and was Chief Architect at MetaMatrix. His interests include Java, concurrency, distributed systems, query languages, and software design. Alex enjoys writing his blog at http://tech.puredanger.com and has spoken at a number of Java user group meetings and conferences.Presentations by Alex Miller
Cluster your Cache with Hibernate and Terracotta
Terracotta (an open source technology) provides a clustered, durable, virtual heap. You can reduce the load on your database by allowing Terracotta to handle sharing and persistence of temporary conversational state in your web application. One option is to simply cluster your existing Hibernate L2 cache (for instance with ehcache).A higher performance option is to disconnect your POJOs from the Hibernate session and manage them entirely in Terracotta shared heap until they are ready to be written back to the system of record. This option can yield extremely high performance while simultaneously reducing the load on your database, allowing you to scale your system with significantly less hardware.
Java Concurrency Idioms
This presentation will look at the many new additions in Java 5 and 6 for concurrent programming such as Atomics, Locks, synchronizers, and concurrent collections. In particular, we will be looking at common concurrency idioms around locking and access to shared state, thread coordination, thread pooling, and work execution. Each of these topics will be presented with code examples demonstrating common idioms and the usage of these new concurrency primitives.Design Patterns Reconsidered
The Design Patterns book launched a revolution in object-oriented design and provided a vocabulary for OO developers to communicate their ideas. However, in some cases, patterns used blindly can lead to awkward, confusing, or hard to maintain code. It is time for some common patterns used in Java to be reconsidered so that we can derive the benefits from patterns while minimizing their concerns.This talk will re-evaluate key patterns like Singleton, Template Method, Visitor, and Proxy. These patterns have downsides and in some cases, do more harm than good. Examples of each pattern will be give in Java and examined for clarity, testability, and flexibility. Important problems will be discussed and examples of alternate solutions will be given.
Java Collections API
Did you know that Java 5 and 6 added 8 new interfaces and 16 new collection implementations to the JDK, more than doubling the size of the collection API? Collections 201 gives you an update on all of the interfaces, implementations, and utilities and gives you guidance on picking the perfect collection. In particular, Java 5 introduced a new major collection type Queue and a whole new java.util.concurrent package with data structures optimized for concurrent use.Exploring Terracotta
Terracotta is an open-source Java clustering technology. It creates a virtual, durable Java heap that is shared across a cluster of Java Virtual Machines. This is done by dynamically instrumenting bytecode at load time to intercept calls to read and write fields, and also to enter and exit monitor locks. Information about these calls is then transmitted to the Terracotta Server (which can also be clustered) and out to other nodes in the cluster as needed. The advantage of this approach is that many Java programs can be clustered without code changes by providing just external Terracotta configuration. Many performance optimizations are performed to minimize communication and locking costs. Terracotta is commonly used for session sharing in web applications, distributed caching, and distributed workflow processing.This presentation will give an overview of the Terracotta technology, how it's implemented, and common use cases that can benefit from the technology. We will look at some code and cluster some Java applications during the presentation.
Clustered Spring with Terracotta
Spring provides a solid framework for building applications. Terracotta technology provides the ability to cluster portions of the Java heap and multi-threading primitives like synchronized and wait/notify. Terracotta provides integration with Spring that lets you seamlessly extend Spring across multiple Java Virtual Machines! This allows you to take existing beans and spread them across a cluster, turn Spring application context events into distributed events, export clustered beans via Spring JMX, and make your Spring Web Flow apps highly available and clustered.Books by Alex Miller
by Ari Zilka, Alex Miller, and more
-
Get the definitive guide on all the fundamentals of Terracotta as well as user secrets, recipes, and prepackaged frameworks.
Written by Terracotta CTO Ari Zilka and his team, The Definitive Guide to Terracotta: Cluster the JVM for Spring, Hibernate and POJO Scalability covers the following:
* High Availability (HA) nth degree scaling and clustering for traditional J2EE and Java EE 5 applications (using Seam or other application) as well as Spring?based enterprise applications
* Everyday Terracotta using its prepackaged frameworks and integration recipes, including configuration and customization for your application tuning, no matter the scale
* Power user secrets available, including config modules, customized advanced performance tuning, SDLC, Maven, and more
- Available At: http://www.amazon.com/Definitive-Guide-Terracotta-Hibernate-..
Pure Danger Tech
Alex Miller's technical blog
Saturday, May 10, 2008
This talk was by Gil Tene and Michael Wolf from Azul. Azul has their own concurrent garbage collector although this talk focused mostly on the ideas and concepts of concurrent collectors in general and didn’t really dive into their own collector in detail (my only real disappointment in an otherwise fascinating talk).
Concurrent garbage collectors are ones that run while your app is running. This is desirable because it allows your garbage to be cleaned up while minimizing stop-the-world pauses. This makes app performance more predictable. In a non-concurrent collector, failure consists solely of collecting something that’s not garbage (or other such bugs). In a concurrent collector, failure also happens if the pause time of a collector causes an application to fail to meet it’s necessary response times.
Concurrent collectors run alongside your app while it’s running and creating garbage. That means the concurrent collector has to keep up with the garbage to avoid a stop-the-world pause. The problem is that this keeping up usually requires some tuning and it also becomes very sensitive. If load increases or the app changes slightly, it can suddenly fall off the cliff with no warning.
Some terms… Mark is the processing of traversing references and determining live objects. Sweep is a phase to collect dead (non-marked) objects. An alternative to sweep is compaction, which should be obvious.
There was a lengthy discussion about different metrics to consider like heap population (the live set), allocation rate (new objects), mutation rate (modified references), cycle time, etc and a discussion of how these differ according to load. I don’t think I can do that discussion justice but it was pretty interesting.
In talking about testing, there were several important points. One was that all concurrent GCs have to deal with fragmentation (either by sweep or compaction). These are the most taxing parts of any GC (and often stop-the-world operations). Any test load that doesn’t experience those worst parts of the GC cycle are not really testing the garbage collector. It’s important a) not to engineer your test so that GC doesn’t occur (cause that’s what you want to understand) and b) to actually design your test to incur the worst GC quickly enough that you can rerun the test a lot. They gave 20-30 minutes as a good target for a stable test and you typically want to see at least 5 bad GCs during the test to make sure you’re ok.
They actually have an open source tool called Fragger that will add small amounts of object to the heap, but in such a way that it induces fragmentation which will quickly force bad GC. You can run this load generator in your app alongside the normal load. They demonstrated it and in a 1 GB heap they were able to cause bad GC pauses while only using 70 MB of memory, so the heap was mostly empty. Pretty cool. I could definitely see this being useful on performance testing we do at Terracotta.
It turns out there are some common patterns in real apps (that are often NOT in benchmarks) that can cause exactly this kind of fragmentation. One of the most popular is an LRU cache - eviction causes the “active” set to turn over at regular intervals creating garbage. If the concurrent collector can’t keep up, the collector will ultimately get in trouble. Apparently the specjbb benchmark does a really bad job at simulating this kind of “real app” behavior.
Another interesting point was what they called the “mostly” concurrent secret. It’s really important what exactly happens in stop-the-world. Some things that aren’t talked about much but can be important depending on your app are things like class unloading, perm gen collection, weak/soft reference management, stack scanning, code cache cleanup, etc. Since the length of stop-the-world ultimately drives your worst case, this stuff happens to be pretty important.
The Azul collector sounds very cool as they do concurrent young gc and old gc with a guaranteed single pass mark which is oblivious to mutation rate. They also have a concurrent compactor where objects can be moved without stopping the mutator and an entire generatino can be relocated in every gc cycle. There is no stop-the-world fallback cliff as with CMS.
But they only had one slide about the Azul collector without much more detail than that. I’d love to see a more detailed description (maybe it’s there in papers already) and also some more detailed comparison of Azul’s collector vs G1.
If anyone out there has some deep experience in garbage collection and is interested in distributed GC, we are going to be doing some heavy rework of the Terracotta DGC and memory manager this year and would hire the right person to help.
Friday, May 9, 2008
Cliff has been working for a while on developing highly concurrent data structures for use on the Azul hardware which supports 700+ hardware threads. We’re going through the transition right now from 1 to small numbers of cores. Cliff is trying to address the next order of magnitude.
A non-blocking algorithm means that stopping any particular thread does not prevent global progress. It means that no thread locks any resource, then gets preempted or blocked in I/O and also that there are no critical sections (synchronization). This is achieved by using CAS (compare and swap) hardware support via the Atomic* classes. These are fast with no contention and almost as fast even with contention, much much faster than synchronization.
More large CPU shared memory hardware systems allow for very fast concurrent read but still are limited to the speed of a cache miss on write. So, we must avoid all cpus writing to the same location. Even with reader-writer lock, it is not possible to scale past the 50-100 cpu range.
Cliff has worked through 2.5 different data structures, developing non blocking versions of them and this talk is a first attempt to pull out a process for developing such data structures. He did a talk last year at JavaOne on his non-blocking hash table implementation, which has since been simplified a bit and posted on sourceforge. He has also now developed a lock-free bit vector and is in the process of working out the details of a non-blocking queue.
The basic style consists of having an array (resizeable) to hold the shared data. Different threads will generally write to different slots in the array (to avoid write contention) the majority of the time. You then develop a finite state machine to model the state of the data and you must include in the FSM resizing of the array. Any time data is changed, you use CAS instructions via Atomic classes - this avoids ever locking. The only place you need a memory barrier is at one point during the array resize. The resize happens incrementally and never blocks everyone. The CAS reliance ensures that even when there is contention, SOME thread makes progress.
I won’t even try to repeat the details of the FSMs and other detailed descriptions… :)
One interesting result is that with the non-blocking hashtable he saw linear scaling on 768 CPUs maxing at 1 billion reads/sec or and 10 million updates/sec.
I asked a question about how this intersected with the JSR 166 work on fork-join and parallel ops. They definitely both seem to be attacking the same problem of lock contention with queue based work processing, although fork-join seems more focused on the 8-32 core range and Cliff is focused on the 100+ range. It didn’t sound like they had talked much.
Someone else asked about when you might want to use one of these data structures and it sounded like the main determinants were the # of cores and write contention. Cliff’s is slightly faster till you get out around 32 cores and then becomes much faster. Also, even as low as 4 cores if you have 50% write contention (admittedly rare), he can beat ConcurrentHashMap.
Friday, May 9, 2008
The Sun Hotspot guys have been working on a new garbage collector to replace CMS called G1. This presentation went over the differences between the old CMS and the new G1 collectors and also included some perspective from a guy at the Chicago Board of Options Exchange who has been beta testing it.
CMS divides the world into the young and old generations. This is done to take advantage of the observation that the lifetime of objects is highly uneven - the vast majority of objects die young glorious deaths and a very small number of objects live for a very long time (effectively the life of the app). Also important is that there tend to be very few references from the old generation to the young generation. Because of this, it’s ok to focus our collection attention on the young gen.
In CMS, new objects are created in the young generation which is further broken up into eden and two survivor spaces. Young gen GC checks to find live objects and those are put either in a survivor space or in the old generation, depending on age. Old gen gc is mostly concurrent but does stop-the-world pauses to finish up. Also stop-the-world for reference marking. Old gen gc is fragmented and sweep finds holes and manages in free lists. There is a fallback to full stop-the-world collection and compaction.
G1 (”garbage first”) takes a different approach - all memory (except perm gen) is broken into 1 MB “regions”. Young and old are both comprised of some set of non-contiguous regions but these change over time. During young gc survivors of a region are either copied to a new young gen region or to an old gen region as appropriate.
In G1, the old generation GC there is one stop-the-world pause to mark. If any region is found to contain no live objects, the region is immediately reclaimed (this happens more frequently than you’d expect due to locality). Then compact old regions into new old region. Old gen collections are piggybacked on young gen collections.
The technique for how G1 manages references into a region is called “remembered sets”. Every region has a small data structure (<5% of total heap) that reduces work needed to do marking. The remembered sets contain all external references into that region (references within the region are not included).
After this initial layour by Tony Printezis (who was entertaining and explained things well), Paul Ciciora talked about how they test things at CBOE. Probably most important Paul said it is still a work in progress and not production-ready yet.
One interesting item from the Q&A was that this will definitely be in Java SE 7 (probably committed in next few weeks) and that it will also be released in Java 6 update as well.
Friday, May 9, 2008
Thursday night was a lot of fun. I guess Smashmouth was playing but I don’t know anyone that went - anyone hear how it went? I was ready to relax a bit after my talk so it was a fun night. I went to dinner with some other NFJS speakers and had a great time and met some new people.
Then I headed to the the Guice BOF, which was a fun crowd. They talked about the upcoming Guice 2.0 and the interesting new features. Unfortunately, I’ve only played a bit with Guice so I can’t say that I found it personally meaningful, but it sounded like some good stuff. I thought the idea of using custom annotations was interesting. Plus they had copious quantities of beer.
Then it seemed like that whole crowd migrated en masse to the Java Posse BOF (also beer enabled from Atlassian). The Java Posse do a podcast of Java news of course and this was a live episode. They took a bunch of polls of the audience. Given the exuberant crowd, I’m not sure how much you can draw from the responses. :)
Some little tidbits that they threw out that I found interesting.
- Scala is the new Java - I know the posse love the Scala and I think it’s an amazing language but I’m just not seeing it as the “new Java”.
- Kindle runs Java - I guess it was unknown before that the new Kindle e reader device from Amazon runs Java (on Linux). Pretty cool
- What is Duke? - there was a brief discussion of what the Java mascot actually is. Someone said it’s based on the shape of the communicator device in Star Trek. None other than Josh Bloch said “Duke is the bastard child of a scrubbing bubble”. Sounded like an authoritative source to me.
I asked a question and got a free copy of the new Effective Java at the BOF, which I am quite happy about.
Overall a fun night! (I don’t think Smashmouth could have competed.)
Friday, May 9, 2008
I went to dinner last night with a bunch of other speakers from the No Fluff tour and had a great time. One little interesting tidbit of conversation was about shared flow state. As programmers, we’re all familiar with solo flow state when you’re in the zone, writing code. We were talking about pair programming, which when it’s good, is a shared flow state and can be even richer than solo flow.
I mentioned that another case of this I’ve experienced is playing viola in a quartet. You are very intensely aware of what the other three people are doing (even though that may be evident externally) and you are all trying to accomplish the same thing. It’s exhilarating in a very unique way.
Of course, sex is another good example of shared flow state. :)
Anyone have any others?
