State of the art multi-master replication in IBM WebSphere eXtreme Scale 7.1
IBM WebSphere eXtreme Scale already supported a single grid running in multiple data centers, even over a WAN. This was an implementation of CP (see the CAP theorem for details). We've just released V7.1 and are now the only product on the market implementing both single grid across multiple data centers as well as now supporting mirroring independent grids over WAN links.
Multi-master Replication
This new feature allows a customer to host a grid in multiple locations connected through user defined links. Each grid is fully independant and runs it's own catalog service. The locations need to have the same grid defined with the same number of partitions and map/template definitions. The customer can create a link between two locations and from that point forward, WXS will try to make both locations identical. You may also see locations referred to as domains in the documentation. This isn't just a trigger forwarding changes from one side to the other like the competition, this is true replication. If you create a link between a location with data and another empty location then WXS will copy the data from the non empty location to the empty location automatically.
These replication links are bi-directional. Changes made on either side of the link will be propagated to the other side. Besides, the simple topology of two locations, more complex topologies can be constructed. For example, a 'line' topology is easy to implement. If we have 3 locations, A, B and C then make a link between A and B and B and C and all three locations will be replicating to each other. C will get changes from A through B and A gets changes from C through B. A ring topology can be readily constructed also by closing the loop by creating another link between A and C. A ring has the advantage over a line in that changes can be pulled either way around the ring (we do both ways simultaneously actually). This means a link can fail and data can still be pulled the other direction. Rings are more durable than lines for this reason. Changes are sent around the ring in both directions and the latency before changes from a node A appears on another node is dependent on how many links are between them.
Hub and spoke topologies can also be constructed using the same approach. Imagine a location H as the hub. Locations around the hub can simply create a link between themselves and the hub. For example, three locations A, B and C. We would make links A-H,B-H,C-H and this gives us a simple hub/spoke topology. You can use links to create almost any topology you want. If a link goes down then replication is suspended along that link until it restarts.
This is not queue based replication
WXS does not use a queue for replication. We use an approach using constant memory independent of how far back a location becomes out of sync as well as the number of links defined to other locations. This basically means you don't need to worry if a link is down for a while, you won't run out of memory buffering changes like some of the competitors would.
No gateways for replication
Each container JVM communicates directly with like shards in the other location. There are no gateways to define. If you want to define a concentrator then just make a 'fake' location with a link to the left grid and the right grid and presto, it's a gateway but you don't need to make one.
Collision Handling
Collisions are a way of life with a distributed system using AP characteristics. WXS uses a arbitrator to resolve collisions. It has a default arbitration algorithm which simply picks the change from the lexically lower location name. Very simple but will likely be enough for many customers. WXS can also use an application provided arbitrator that makes a new version reconciling the conflicting versions received. You should designate one location as the principal arbitration location so there is a single 'truth' resulting from any collisions.
Summary
The multi-master support in IBM WebSphere eXtreme Scale V7.1 complements the existing state of the art single grid multi-data center support already proven in the product. It allows data centers to be more independent and still provides application level consistency through arbitration. It's implementation is very scalable and it's designed to use constant resources regardless of the topology or reliability of the links between locations. It's also built in to the product.
I'm looking forward to working with customers to exploit both styles of multi-data center replication.
You can read more about multi-master replication in the V7.1 info center.
About Billy Newport
Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.
Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.
Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.
More About Billy »NFJS, the Magazine
December Issue Now AvailableBDD and REST
by Brian SlettenMocks and Stubs in Groovy Tests
by Kenneth KousenAlgorithms for Better Text Search Results
by John GriffinKnowns and Unknowns of Scrum and Agile
by Brian Tarbox