SPRINGONE 2GX 2012: THE SPRING, GROOVY, GRAILS, & CLOUD EVENT OF THE YEAR!


Paco Nathan

Data Scientist

Data Scientist @ http://ConcurrentInc.com. Developer Evangelist for http://Cascading.org open source project. Expert in Hadoop, R, cloud computing, machine learning, predictive analytics, NLP. BS MathSci and CS CompSci from Stanford, 25+ yrs in tech industry. For the past several years, I've been leading Data Science teams, working with large scale MapReduce applications.



Blog

Cascading for the Impatient, Part 6

Posted 2012-08-07 15:22:00.0

In our fifth installment of this series we showed how to implement TF-IDF in Cascading application. If you haven’t read that yet, it’s probably best to start there. Today’s post extends the TF-IDF app to show best practices for test-driven developmore »

Cascading for the Impatient, Part 5

Posted 2012-07-31 18:17:00.0

In our fourth installment of this series we showed how to use HashJoin on two pipes, to perform “stop words” filtering at scale in a Cascading 2.0 application. If you haven’t read that yet, it’s probably best to start thermore »

Cascading for the Impatient, Part 4

Posted 2012-07-24 10:13:00.0

In our third installment of this series we showed how to write a custom Operation for a Cascading 2.0 application. If you haven’t read that yet, it’s probably best to start thermore »
Read More Blog Entries »

Presentations

Introduction to Cascading

Introduction to Cascading, an application framework for Java developers to deploy robust, enterprise-grade applications on Apache Hadoop. We'll start with the simplest Cascading program possible (file copy in a distributed file system) and progress in smamore »