Implementation details of the Apache Lucene WebSphere eXtreme Scale store
I used 4 maps.
MetaData Map
This map uses the file name as the key and the value is a FileMetaData object. The FileMetaData object has things like lastmodificationtime and number of bytes.
Chunk Map
This has a String key and a byte[] value. The String Key is the fileName + "#" + block number. Every file is stored as a set of blocks. Each block has an entry in the chunk map. The chunks are fixed size. The size is currently defined in GridOutputStream as a constant.
Directory Map
This stores Lucene Directory objects. The key is the directory name and the value is a Set of Strings. Each string in the set is a file name used by this directory.
Lock Map
This uses a String key which is a combination of the Directory name and the file name. The value is irrelevant, I use a Boolean. If an entry exists in the map for a file then someone has locked it. Thus, the locking protocol just attempts to insert a 'lock' in the map for a given file. This is only possible if there is no current entry. Unlock simply removes the entry for a file. Locks can be given a lease behavior by simply adding a TTL evictor on this Map. You could set the default TTL to be 10 minutes so worst case a locked file would be automatically unlocked after 10 minutes. Possibly an improvement here would be to 'touch' the lock every N reads or writes of data to the file and use a LAST_ACCESS_TIME eviction policy. This would likely prevent the lock being removed/invalidated if it's still in use.
No near cache is used. I used my wxsutils library to interact with the grid because it simplifies the application programming to just simple put/remove static operators.
About Billy Newport
Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.
Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.
Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.
More About Billy »NFJS, the Magazine
December Issue Now AvailableBDD and REST
by Brian SlettenMocks and Stubs in Groovy Tests
by Kenneth KousenAlgorithms for Better Text Search Results
by John GriffinKnowns and Unknowns of Scrum and Agile
by Brian Tarbox