SPRINGONE 2GX 2012: THE SPRING, GROOVY, GRAILS, & CLOUD EVENT OF THE YEAR!


Paco Nathan

Data Scientist

Data Scientist @ http://ConcurrentInc.com. Developer Evangelist for http://Cascading.org open source project. Expert in Hadoop, R, cloud computing, machine learning, predictive analytics, NLP. BS MathSci and CS CompSci from Stanford, 25+ yrs in tech industry. For the past several years, I've been leading Data Science teams, working with large scale MapReduce applications.

Blog

Cascading for the Impatient, Part 6

Posted 2012-08-07 15:22:00.0

In our fifth installment of this series we showed how to implement TF-IDF in Cascading application. If you haven’t read that yet, it’s probably best to start there. Today’s post extends the TF-IDF app to show best practices for test-driven developmore »

Cascading for the Impatient, Part 5

Posted 2012-07-31 18:17:00.0

In our fourth installment of this series we showed how to use HashJoin on two pipes, to perform “stop words” filtering at scale in a Cascading 2.0 application. If you haven’t read that yet, it’s probably best to start thermore »

Cascading for the Impatient, Part 4

Posted 2012-07-24 10:13:00.0

In our third installment of this series we showed how to write a custom Operation for a Cascading 2.0 application. If you haven’t read that yet, it’s probably best to start thermore »
Read More Blog Entries »

Presentations

Introduction to Cascading

Introduction to Cascading, an application framework for Java developers to deploy robust, enterprise-grade applications on Apache Hadoop. We'll start with the simplest Cascading program possible (file copy in a distributed file system) and progress in smamore »

Introduction to Cascading

close

Paco Nathan By Paco Nathan

Introduction to Cascading, an application framework for Java developers to deploy robust, enterprise-grade applications on Apache Hadoop. We'll start with the simplest Cascading program possible (file copy in a distributed file system) and progress in small steps to show a Java-based social recommender system based on Twitter feeds.



Introduction to Cascading, an application framework for Java developers to deploy robust, enterprise-grade applications on Apache Hadoop. We'll start with the simplest Cascading program possible (file copy in a distributed file system) and progress in small steps to show a Java-based social recommender system based on Twitter feeds.

The objective is to show how to work with “Big Data”, starting on a laptop with sample data sets, to generate JAR-based apps which can be deployed on very large clusters.

We'll show best practices for scalable apps in Cascading, how to leverage TDD features, etc.