Recent Tweets
join our mailing list
* indicates required

About Ambleside

Ambleside Logic is led by Aaron Rosenbaum. Father of 3, Programming since 7, DevOps since 11 (hacking RSTS), exIngres, exCTP, exCohera. Sold two companies to Oracle, one to HP. Research + Strategy for NoSQL/BigData ecosystem implementors, vendors and investors.

« Peaks, Valleys and wrong-turns - presenting time series data in analytics applications | Main | User Experience Guidelines for shared dashboards (less is more) »
Friday
Jul222011

Open Source ETL with Hadoop

As followup to my article on BI projects for 2012, I got a few questions about ETL and Hadoop.  Here are some of the leading options for doing ETL projects with Hadoop.

 

Cloudera/Sqoop

Lots of nifty tools.  Sqoop moves data to and from HDFS from RDMS's.  Flume moves log files.  Transform logic gets written as part of Map(). I think they are bundling connectors for Netazza and some stuff from Quest for Oracle but fuzzy on licensing terms...I know all these tools are in CDH3 - if you are running CDH2, they may not be there...

Pentaho

Pentaho has a well known OpenSource BI suite.  I think they are leveraging HIVE/JDBC.  Haven't used it but worth looking at, especially if you have played with Pentaho before. Kettle certainly can migrate relational to JSON (seems sort of backwards but for some applications, can't argue with performance.)  I know they were trying to get Hadoop integration into Kettle 4.X...any comments? I'll revise with more info.

Oozie/Pig

If you are tackling things in a more native way with Pig, don't forget about Oozie for controlling your external calls to existing transform logic.  Oozie is part of Clouderas dist and I'm sure will be in Hortonworks.

HIHO

New - I haven't used it.  Looks interesting.  HDFS centric instead of RDMS centric....

Clover and Talend

Both great open source ETL/EAI tools but neither seem to be making any Hadoop specific efforts.  Of course both could leverage Sqoop.  I'm not that sure Pentaho is really that far ahead but at least they seem to make an effort.  

PrintView Printer Friendly Version

References (33)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    Excellent page, Carry on the beneficial work. Appreciate it!
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: acai berry info
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: Buy Acai
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: fish oil
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: business website
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: politics
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: mathworks webinar
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: newsletter program
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: webinar pricing
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: Boston SEO Experts
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: Orlando AC Repair
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Response: orlando ac repair
    Open Source ETL with Hadoop - Home - Hillsborough, CA
  • Response
    Open Source ETL with Hadoop - Home - Hillsborough, CA

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.
Member Account Required
You must have a member account on this website in order to post comments. Log in to your account to enable posting.