Billy Newport's complete blog can be found at: http://www.devwebsphere.com/devwebsphere
2011-10-05 20:15:00.0
2011-09-20 08:18:00.0
I'd tried to use a distributed grid with Lucene before. I was at a customer with the competitors doing a face off type engagement. We (the competitors and myself) both implemented Directory implementations for Lucene on top of our respective caching products. The implementation I wrote is on github here. The customer had a very large lucene index in memory, abouit 30GB and they were seeing very large GC pauses (minutes) using the RAMDirectory supplied with Lucene. The plan was to use a grid based directory to move the indexes in to a shared remote grid.
This proved much slower than the RAMDirectory the customer was already using. Lucene was too 'chatty' with the Directory and this means lots of RPCs to the grid. The Directory abstraction is too low level. We tried solutions like near caches but depending on the query, the near cache could need to be very big thus making the solution not viable. This ultimately meant that the customer didn't go forward with either grid as a directory store.
However, I was reading about Solandra recently and they have what looks like a better approach. The problem we had was that just implementing a Directory is too low-level. There ends up being too many RPCs between the Lucene engine and the grid. Solandra implements a Cassandra store underneath Lucene. They did not implement a Cassandra directory which would have had the same issues as I just discussed. Instead, they implemented a higher level plugin. They moved up to implement IndexReader and IndexWriter and this allows them to push some of the search down in to Cassandra's native index/query support rather than just use it as a block store. This seems to reduce the number of RPCs between the store and the lucene engine considerably.
This looks like a much more performant approach and one that should work well with a data grid also. If I get time then I may try to implement this with a datagrid and post results.
2011-06-16 20:53:00.0
I've been working with one of our biggest customers and they have a high quality problem in that under extreme load, the grid (WebSphere eXtreme Scale) is holding up just fine. but there is a problem. The issue is the database the grid runs on top of. They are using write behind to asynchronously write changes made to the grid to the database. This basically will only write changes to the database every 5 minutes or 1000 entries depending on what happens first.
The issue is even with this batching occurring, the database cannot keep up with the write load. They could buy a bigger database but all they really use their database for is a disk based log for whats in the grid. They load the grid from the database and then all applications use the grid as the system of record for that data. The updates made to the data in the grid and written back to the database asynchronously.
If the database cannot keep up even with the batching then it starts to fall behind and more importantly, the buffered changes not yet flushed from the grid also build up in memory.
Buying a bigger database box is also kind of pointless as the whole point of the grid was to avoid having to buy a large SMP box to run the database. Scaling up is expensive and database boxes with their SANs that can keep up with extreme loads are not really what we want to purchase. We want the grid to act as a shock absorber so that a smaller database can be used. But, this is proving a problem at this customer just using normal write behind techniques.
The real problem here is databases are not designed to be used like this. WebSphere eXtreme Scale is basically using a SQL database as a write only disk store and frankly, databases are not very good at this.
So, what is designed as a write only disk based store? Believe it or not, a messaging system is. JMS queues are designed to implement two operations. PUT to the queue and GET from the queue and they can do this very efficiently. More efficiently than a database. This is all the grid really needs here. Flushing the writes from memory to disk through a database doesn't make a lot of sense.
The idea here is to put the JMS queue between the grid and the database. This way, the grid can flush the changes to 'disk' faster than before and then we can apply the changes from the queue at our leisure to the database. This decouples the database and the grid. The grid can now deal with flash loads or tsunamis of load when they occur and flush the changes to disk using the messaging system. The database can apply those changes at it's own slower rate because the JMS queue is buffering the changes in a durable way to disk. Thus, we don't need to spend a lot of money on an expensive vertical or horizontal scaling DBMS.
This is very like a seat belt in a car. We're trying to absorb a shock by stretching it out over a longer period of time. The JMS queue allows this to happen. While the flash load is happening, the queue is able to keep up with the grid and everything stays stable. The database is emptying the queue as fast as it can which is slower than the grid. Once the flash load is gone then the grid goes back to normal load while the DBMS can keep applying changes for another couple of hours if need be.
The implementation of this could be a Loader that writes the changes to the JMS queue. A JMS listener can then pick up the changes are apply them to the database. This is basically like the write behind built in to WXS but implemented instead with an external JMS queue.
2011-05-29 14:49:00.0
I'm maintaining a live google document with the best that we're seeing. You can see the document at this link http://bit.ly/wxsbestpractice.
2011-05-26 10:56:00.0
I had the opportunity recently to compare my iPhone 4 on ATT with wifi tethering to my laptop and a friends Verizon 4G phone with wifi also. These were the results:
ATT Tether
I did both within 10 minutes of each other from the same laptop. Main difference is download speed and latency.