Friday
Jul222011
Open Source ETL with Hadoop

As followup to my article on BI projects for 2012, I got a few questions about ETL and Hadoop. Here are some of the leading options for doing ETL projects with Hadoop.
Cloudera/Sqoop
Lots of nifty tools. Sqoop moves data to and from HDFS from RDMS's. Flume moves log files. Transform logic gets written as part of Map(). I think they are bundling connectors for Netazza and some stuff from Quest for Oracle but fuzzy on licensing terms...