Implementing global indexes on WebSphere eXtreme Scale
We get asked a lot how to create global indexes with WebSphere eXtreme Scale (Scale). Scale usually keeps K/V pairs in a large distributed hash table (DHT). Customer want to index attributes of the value and then run queries to find all the Keys with values match some criteria based on the index.
You can, of course, index the attribute and then run a parallel agent to run a query on each partition and then union the results. That works but isn't great from a throughput scaling point of view as every server in the grid is doing a part of the total query. The throughput is then limited to the throughput of the slowest box in the grid. It'll work but it's not great. What it does well is to speed searching over the records, as the larger the grid, the more processors are running the search.
But, there is a better way. It's not so difficult to implement a global index. This is a grid wide index for a specific attribute. This is implementable on top of a DHT (like WebSphere eXtreme Scale) and makes it possible to find all matching keys in a single operation using this type of index. Basically, we make a Map whose key is the search term and the value is the list of keys containing an attribute with that search term. Thus, an index lookup results in a Map.get(value) returning the list of keys. The list of keys can then use WXSUtils.getAll to bulk fetch the results quickly and now we have a better solution. The throughput of this system is much better than a parallel search implementation. Each index lookup results in one RPC to exactly one server. Thus, one server can do N of those look ups. M servers can do M x N lookups and so now we have a better response time as well as linear scaling on index look ups from a throughput point of view.
The code for this is available on my github repository in the wxsutils jar with full source code. It allows three types of index to be created. An exact match one (==), an index which matches any attribute prefixed by a symbol (LIKE XXXX%) and finally one which matches any records whose attribute contains the query anywhere (LIKE %XXX%).
I'll post a blog entry shortly showing some code on how to use these indexes. The Test cases on github also include code showing how it works.
Why would you want a global index? Lets suppose you wanted to find all matching products containing a substring anywhere in the product name. The SubStringIndex in this code allows you to easily index that attribute in a Map containing these records and then find all matching products in low single digit milliseconds or so (depending on the processor), end to end. WebSphere eXtreme Scale can maintain that response time as you scale up the grid by adding boxes. More boxes means larger indexes are allowed and there is more CPU/network for requests, linear scaling. Remember, the query throughput is M x N which M is the number of servers in the grid.
The classes to look at in github are in the package "com.devwebsphere.wxssearch". The IndexManager is the main class from which you obtain an Index instance to search an individual attribute or you can run multiple attribute searches directly using the IndexManager. There are annotations to annotate which attribute on the value object you want to index. The annotations specify the kind of index as well as index configuration properties.
About Billy Newport
Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.
Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.
Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.
More About Billy »Why Attend the NFJS Tour?
- » Cutting-Edge Technologies
- » Agile Practices
- » Peer Exchange
Current Topics:
- Languages on the JVM: Scala, Groovy, Clojure
- Enterprise Java
- Core Java, Java 7
- Agility
- Testing: Geb, Spock, Easyb
- REST
- NoSQL: MongoDB, Cassandra
- Hadoop
- Spring 3
- Automation Tools: Git, Hudson, Sonar
- HTML5, Ajax, jQuery, Usability
- Mobile Applications - iPhone and Android
- More...
NFJS, the Magazine
December Issue Now AvailableBDD and REST
by Brian SlettenMocks and Stubs in Groovy Tests
by Kenneth KousenAlgorithms for Better Text Search Results
by John GriffinKnowns and Unknowns of Scrum and Agile
by Brian Tarbox

