Shay Banon - Blog Entries

The Future of Compass & ElasticSearch

2010-07-07 02:00:00.0

Its been a long time since I blogged about Compass, and I guess its about time to discuss Compass, ElasticSearch, and how they relate to one another.

I started Compass six years ago with a real belief that search is something that any application should have (and search here is not just full text search), and the aim was to have search integrated into a Java application as simple as possible.

Compass has pioneered some really exciting features, including the ability to map your domain model to a search engine (OSEM), later also XML and JSON support, and integration with other ORM libraries (Hibernate, JPA, and so on) to try and make the integration of search as seamless as possible to your typical Java stack application (which has been copied quite nicely by others as well ;) ).

During the lifecycle of Compass, I have also tried to address the scalability aspects of a search solution. By integrating with solutions such as GigaSpaces, Coherence, and Terracotta, the aim was to try and make a search based solution more scalable and usable by applications.

About 8 months ago, I started to think about Compass 3.0. I knew that it required a major rewrite in how it uses Lucene (Lucene 2.9 came with several major changes internally, mainly in how it handles low level readers and search), and also in how to better create a scalable search solution, being able to scale from a single box to many easily and seamlessly. The changes did not end there, I also wanted to create a solution where adding more advance search features, such as facets and others, would be simple.

The more I thought about it, the more I understood that this basically entitles a complete rewrite of Compass if its going to be done correctly. Also, I wanted to bring to the table the experience I had with search over the past years and how search should be done, which is hard with an existing codebase.

This is an important point, especially when it comes to scalable search, which I would like to touch on. The way that I started with trying to solve scalable search using Compass is by creating a distributed Lucene Directory implementation. Of course, this does gets you a bit further down the scalability road, but its very evident for people knowing how Lucene works (or, for that matter, search engines) that this is not the preferred solution (I knew it as I was writing it). Even going up the stack and creating something similar to Lucandra won’t cut it… . The proper way to solve the scalability problem is by running a “local” index (a shard) on each node, and do map reduce when you execute a search, and routing when you index (this is a very simplistic explanation).

So, I started out building elasticsearch. Its basically a solution built from the ground up to be distributed. I also wanted to create a search solution that can be used by any other programming language easily, which basically means JSON over HTTP, without sacrificing the ease of use within the Java programming language (or more specially, the JVM).

To be honest, I am amazed at what has happened in just 8 months. ElasticSearch is up and running, providing all the core features I wanted it to have at the beginning. Its a scalable search solution, with a JSON over HTTP interface as well as really nice “native” Java API (it gets nicer in the upcoming 0.9 release).

Sadly, I have been spending a lot of time on elasticsearch, and almost no time on Compass itself, especially around the forum. For that, I deeply apologize to the amazing Compass users that have been there over the years.

So, what about the future of Compass? I see ElasticSearch as Compass 3.0. Thats how I would have wanted the next major version of Compass to look like. This is not to say that the current ElasticSearch version implements all of Compass features, but the basis is there. The two main features that are missing are OSEM, and ORM integration.

As for OSEM, ElasticSearch can already index JSON (and JSON like structure, for example, in the form of a Map). What is left to be done is to create a mapping layer from the object model to this JSON like structure. ORM level integration should work in very similar to how Compass implements it today.

In terms of Java (JVM) level integration, ElasticSearch can easily either start embedded or remote to the Java process, both in distributed mode or in a single node mode.

So, what should someone do today? If you are going to start a new project, I would suggest you take ElasticSearch for a spin, I am sure you will like it. Existing Compass users should start to give serious thought as to how to migrate to ElasticSearch. Hopefully, once OSEM is implemented in ElasticSearch, the migration will be simpler.

Regarding the current Compass 2.x version, its basically in maintenance mode. I will try and help in the forum as much as I can. Will gladly accept patches and apply them to trunk and maybe even release a minor version for it. If someone would like to get more involved with it (administer the forum, help with the patches, releases, commit permission, and so on), I would be happy for it.

As far as I am concerned, the future is ElasticSearch. It is probably the most advanced open source distributed search solution you can find today, and its integration with Java (JVM) is a first class citizen. I hope that Compass user base will follow… .

-shay.banon

REST and Web Sockets?

2010-02-14 02:00:00.0

I must be missing something here… . I went ahead and looked at Web Sockets since it sounded like a great solution to improve the communication from client to server instead of HTTP. It seems like the WebSockets send API only allows to send either binary data or a text string. And this got me wondering, how the frack do you implement REST over WebSockets in a uniform way?

First and foremost, how do you represent a URI? Second, how do you represent the HTTP methods (GET, PUT, POST, …)? And last, how do you represent HTTP uri parameters and headers? It seems like maybe a solution for this is to built some sort of schema into the content that goes into that text string. Something like a JSON string that has a “uri” field, and “params” and so on. But thats annoying, since with HTTP, you can create very simple gateways that simply use the headers or parameters without needing to parse the body…

Am I missing something here? Why doesn’t WebSockets send method have the notion of a URI and headers (which can be passed in optimized binary format for fast parsing)? It seems like REST has taken a beating with WebSockets…

On top of that, it seems like (and correct me if I’m wrong here) you register a listener that accepts all responses. If we want to simulate an async request response scenario, the client needs to create a unique request id, and the server must send it back so the client will correlate a response to a specific request.

If all the above holds, it means that people will reinvent what both REST and async request response communication does and probably not in a uniform manner. Do we now need a REST over WebSockets specification?

p.s. I understand the benefits of WebSockets for comet style, long polling HTTP requests, just want to use it for typical async RESTful request response type instead of HTTP.

ElasticSearch

2010-02-08 02:00:00.0

Well, the gig is out. What I have been working on for the past several months is now alive. ElasticSearch is an open source, distributed, RESTful search engine. More info about it can be found in the “\”You Know, for Search\ blog post. There is also a nice overview page.

How does that relates to Compass? Good question, which deserves a proper blog post. I will be maintaining another blog just for ElasticSearch, and also there is a twitter user, you should follow.

Enjoy!

Distributed Autowiring with GigaSpaces Executors

2009-07-22 08:39:00.0

With the 7.0 release of GigaSpaces, I thought I would write some blogs about the new and exciting new features that are part of the release. The first feature that I would like to talk about is GigaSpaces executors support (which premiered in 6.6) and one of the cool things you can do with it.

In essence, GigaSpaces executor support allows to define custom Tasks that will be executed within GigaSpaces in a collocated manner with data. Aside from collocation, one of the nice feature of it is the fact that the code does not have to be predefined within the GigaSpaces cluster, but instead it is loaded on demand. Here is an example of such a simple task:

public static class TradeValueTask implements Task<Float> {
 
    private long tradeId;
 
    @TaskGigaSpace
    private transient GigaSpace gigaSpace;
 
    public TradeValueTask(long tradeId) {
        this.tradeId = tradeId;
    }
 
    @SpaceRouting
    public long getTradeId() {
        return tradeId;
    }
 
    public Float execute() throws Exception {
        Trade trade = gigaSpace.readById(Trade.class, tradeId);
        return trade.getValue();
    }
}

The above task allows one to actually get *just* the trade value for a given trade id, instead of reading the whole trade and extracting the value from it on the client side.

The execution will be directed to the same partition that trade exists on (assuming that the trade id is also the routing field for Trade) and executed there. This means that the readByid operation will be executed in a collocated manner with the space instance.

Here is an example of how it can be executed:

1
2
3

// gigaSpace here is a clustered proxy of the whole cluster
AsyncFuture<Float> result = gigaSpace.execute(new TradeValueTask(1));
float value = result.get();

Now, lets assume that calculating the trade value is complex and we already have a service that knows how to configure it in the Processing Unit descriptor file:

1	<bean id="traveValueCalulator" class="eg.TradeValueCalulator" />

We would love to be able to use it in order to calculate the trade value from within the processing unit. What we actually want to do is be able to get a handle to the TradeValueCalculator from within our class. There are several ways to do so, but one of the nicest is to actually use Spring autowiring capabilities to inject it, which we can easily do:

@AutowiredTask
public static class TradeValueTask implements Task<Float> {
 
    private long tradeId;
 
    @TaskGigaSpace
    private transient GigaSpace gigaSpace;
 
    @Autowired
    private transient TradeValueCalulator tradeValueCalulator;
 
    public TradeValueTask(long tradeId) {
        this.tradeId = tradeId;
    }
 
    @SpaceRouting
    public long getTradeId() {
        return tradeId;
    }
 
    public Float execute() throws Exception {
        Trade trade = gigaSpace.readById(Trade.class, tradeId);
        return tradeValueCalculator.calculate(trade);
    }
}

By marking the class with the @AutowiredTask annotation, it will automatically get autowired with the beans defined within the processing unit. Pretty cool, no?

(As a side note, of course, autowiring takes time, which might be of essence ;). Another option is just implement ApplicatiobnContextAware, and use the ApplicationContext to get the bean based on its id).

So we can autowire beans into a Task executed using GigaSpaces. One last thing before we end this post, I would like to show another nifty feature of GigaSpaces executors, which is the executor builder.

Lets assume that we now want to calculate the sum of several trades by their ids. We could have gone ahead and execute each task, wait on all the futures, and sum the results. But, why work hard when we (GigaSpaceians) can work hard for you (my personal motto):

AsyncFuture<Float> result = 
          gigaSpace.executorBuilder(new SumReducer<Float, Float>(Float.class))
                                 .add(new TradeValueTask(1)) 
                                 .add(new TradeValueTask(2)) 
                                 .add(new TradeValueTask(3))
                                 .execute(); 
float value = result.get();

All three tasks will be executed in parallel, in a none blocking mode, each on its respective partition, and the results will be automatically reduced using our built in SumReducer.

Enjoy!

Distributed Autowiring with GigaSpaces Executors

2009-07-22 02:00:00.0

With the 7.0 release of GigaSpaces, I thought I would write some blogs about the new and exciting new features that are part of the release. The first feature that I would like to talk about is GigaSpaces executors support (which premiered in 6.6) and one of the cool things you can do with it.

In essence, GigaSpaces executor support allows to define custom Tasks that will be executed within GigaSpaces in a collocated manner with data. Aside from collocation, one of the nice feature of it is the fact that the code does not have to be predefined within the GigaSpaces cluster, but instead it is loaded on demand. Here is an example of such a simple task:

public static class TradeValueTask implements Task<Float> {

    private long tradeId;

    @TaskGigaSpace
    private transient GigaSpace gigaSpace;

    public TradeValueTask(long tradeId) {
        this.tradeId = tradeId;
    }

    @SpaceRouting
    public long getTradeId() {
        return tradeId;
    }

    public Float execute() throws Exception {
        Trade trade = gigaSpace.readById(Trade.class, tradeId);
        return trade.getValue();
    }
}

The above task allows one to actually get just the trade value for a given trade id, instead of reading the whole trade and extracting the value from it on the client side.

The execution will be directed to the same partition that trade exists on (assuming that the trade id is also the routing field for Trade) and executed there. This means that the readByid operation will be executed in a collocated manner with the space instance.

Here is an example of how it can be executed:

// gigaSpace here is a clustered proxy of the whole cluster
AsyncFuture<Float> result = gigaSpace.execute(new TradeValueTask(1));
float value = result.get();

Now, lets assume that calculating the trade value is complex and we already have a service that knows how to configure it in the Processing Unit descriptor file:

<bean id="traveValueCalulator" class="eg.TradeValueCalulator" />

We would love to be able to use it in order to calculate the trade value from within the processing unit. What we actually want to do is be able to get a handle to the TradeValueCalculator from within our class. There are several ways to do so, but one of the nicest is to actually use Spring autowiring capabilities to inject it, which we can easily do:

@AutowiredTask
public static class TradeValueTask implements Task<Float> {

    private long tradeId;

    @TaskGigaSpace
    private transient GigaSpace gigaSpace;

    @Autowired
    private transient TradeValueCalulator tradeValueCalulator;

    public TradeValueTask(long tradeId) {
        this.tradeId = tradeId;
    }

    @SpaceRouting
    public long getTradeId() {
        return tradeId;
    }

    public Float execute() throws Exception {
        Trade trade = gigaSpace.readById(Trade.class, tradeId);
        return tradeValueCalculator.calculate(trade);
    }
}

By marking the class with the @AutowiredTask annotation, it will automatically get autowired with the beans defined within the processing unit. Pretty cool, no?

(As a side note, of course, autowiring takes time, which might be of essence ;). Another option is just implement ApplicatiobnContextAware, and use the ApplicationContext to get the bean based on its id).

So we can autowire beans into a Task executed using GigaSpaces. One last thing before we end this post, I would like to show another nifty feature of GigaSpaces executors, which is the executor builder.

Lets assume that we now want to calculate the sum of several trades by their ids. We could have gone ahead and execute each task, wait on all the futures, and sum the results. But, why work hard when we (GigaSpaceians) can work hard for you (my personal motto):

AsyncFuture<Float> result = 
          gigaSpace.executorBuilder(new SumReducer<Float, Float>(Float.class))
                                 .add(new TradeValueTask(1)) 
                                 .add(new TradeValueTask(2)) 
                                 .add(new TradeValueTask(3))
                                 .execute(); 
float value = result.get();

All three tasks will be executed in parallel, in a none blocking mode, each on its respective partition, and the results will be automatically reduced using our built in SumReducer.

Enjoy!

SpringOne 2GX 2011

Chicago, October 25-28, 2011

the spring, groovy & grails event of the year!

The Future of Compass & ElasticSearch

2010-07-07 02:00:00.0

REST and Web Sockets?

2010-02-14 02:00:00.0

ElasticSearch

2010-02-08 02:00:00.0

Distributed Autowiring with GigaSpaces Executors

2009-07-22 08:39:00.0

Distributed Autowiring with GigaSpaces Executors

2009-07-22 02:00:00.0

SpringOne 2GX 2011

Chicago, October 25-28, 2011

the spring, groovy & grails event of the year!

The Future of Compass & ElasticSearch 2010-07-07 02:00:00.0

REST and Web Sockets? 2010-02-14 02:00:00.0

ElasticSearch 2010-02-08 02:00:00.0

Distributed Autowiring with GigaSpaces Executors 2009-07-22 08:39:00.0

Distributed Autowiring with GigaSpaces Executors 2009-07-22 02:00:00.0

The Future of Compass & ElasticSearch

2010-07-07 02:00:00.0

REST and Web Sockets?

2010-02-14 02:00:00.0

ElasticSearch

2010-02-08 02:00:00.0

Distributed Autowiring with GigaSpaces Executors

2009-07-22 08:39:00.0

Distributed Autowiring with GigaSpaces Executors

2009-07-22 02:00:00.0