I've been staring at Apache Thrift lately mostly because it's had a lot of hype. We did some tests and while it's good, it's nothing amazing. The single thread RPC speed is average I'd say. It's quicker than some stuff we compared it against but only because it's not as sophisticated. Less sophisticated means less path length means it's faster single threaded.
Thrift uses a socket per client thread model. This means a lot of sockets. You can't tell Thrift to connection pool sockets. This would limit the number of sockets opened per client JVM and also on the server side. Some customers worry about the number of sockets that are opened especially when they pass through firewalls which require resources per socket. Most of the performance advantage is probably down to the lack of this feature alone.
Next, it seems you can't register multiple services per socket. Each service requires its own socket. You can't declare a few server side components and multiplex them over a single server socket. This is kind of a pain if you want to register multiple components with remote interfaces but there may be a way, I just don't know what it is.
Server side scaling is very good because of this simplicity. One thread has one socket and one service and doesn't care about other services/sockets. Fully independent threading means very good vertical scaling as there are no sync blocks etc at all between threads to slow them down. Vertical scaling is as close to perfect as possible as a result. But, it's a lot of sockets and threads.
Non blocking IO can be used to manage the threads better on the server side but the default implementation is very simple. It's just a single thread managing a selector for all the sockets. The problem here is that with a lot of sockets, that single thread needs to service all of them and that can lead to latencies.
Note: the selector thread only spots sockets with work and then hands it off to a separate thread pool for execution. I'm not saying thrift is single threaded, I'm just saying it uses a single selector thread.
Most non blocking code around here using multiple selectors with a thread apiece. We round robin sockets across the selectors and this limits the latency for servicing sockets. We also use AIO which is an improvement over NIO. However, it may be that it was developed on fast x86 cores and so for their load, the latencies for the single selector thread were small. Running it on older hardware or on the newer multicore Niagaras and such would highlight this issue. The benefits of only having to run it on a small set of hardware, I guess.
In short, thrift is good but not brilliant. The multiple language bindings for the IDL is cool, its very fast but only because the hard problems are not solved, the main ones are socket multiplexing and socket pooling. However, it may be that in the environment where it was developed, having tens of thousands of open sockets (a socket PER client thread, think about it...) is fine and is so then it's a very fast solution in that environment.
Pretty cool, none the less.
