SpringOne 2GX 2011

Chicago, October 25-28, 2011

Implementation details of the Apache Lucene WebSphere eXtreme Scale store

Posted by: Billy Newport on

I used 4 maps.

MetaData Map

This map uses the file name as the key and the value is a FileMetaData object. The FileMetaData object has things like lastmodificationtime and number of bytes.

Chunk Map

This has a String key and a byte[] value. The String Key is the fileName + "#" + block number. Every file is stored as a set of blocks. Each block has an entry in the chunk map. The chunks are fixed size. The size is currently defined in GridOutputStream as a constant. 

Directory Map

This stores Lucene Directory objects. The key is the directory name and the value is a Set of Strings. Each string in the set is a file name used by this directory.

Lock Map

This uses a String key which is a combination of the Directory name and the file name. The value is irrelevant, I use a Boolean. If an entry exists in the map for a file then someone has locked it. Thus, the locking protocol just attempts to insert a 'lock' in the map for a given file. This is only possible if there is no current entry. Unlock simply removes the entry for a file. Locks can be given a lease behavior by simply adding a TTL evictor on this Map. You could set the default TTL to be 10 minutes so worst case a locked file would be automatically unlocked after 10 minutes. Possibly an improvement here would be to 'touch' the lock every N reads or writes of data to the file and use a LAST_ACCESS_TIME eviction policy. This would likely prevent the lock being removed/invalidated if it's still in use.

No near cache is used. I used my wxsutils library to interact with the grid because it simplifies the application programming to just simple put/remove static operators.


About Billy Newport

Billy Newport

Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.

Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.

Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.

More About Billy »

NFJS, the Magazine

December Issue Now Available
  • BDD and REST

    by Brian Sletten
  • Mocks and Stubs in Groovy Tests

    by Kenneth Kousen
  • Algorithms for Better Text Search Results

    by John Griffin
  • Knowns and Unknowns of Scrum and Agile

    by Brian Tarbox
Learn More »