Recent Tweets
join our mailing list
* indicates required

About Ambleside

Ambleside Logic is led by Aaron Rosenbaum. Father of 3, Programming since 7, DevOps since 11 (hacking RSTS), exIngres, exCTP, exCohera. Sold two companies to Oracle, one to HP. Research + Strategy for NoSQL/BigData ecosystem implementors, vendors and investors.

« Federated Queries - enabling a Cloudy future? | Main | Oracle buys Endeca »
Wednesday
Oct192011

The Stonebraker Uncertainty Principle

Today at the XLDB conference at the Stanford Linear Accellerator Center, Mike Stonebraker gave a very nice talk concerning Shared disk vs. Shared Nothing architectures.  I won't rehash the basic's here - Mike has been writing about this for a long time.

Several folks sitting around me commented on an inconsistency in his talk:

Hadoop is a shared file system with a computation dispatch mechanism (that can route on data location which is very nice for certain problems.) Without a distributed file system, there is no Hadoop, right? What is unique about HDFS is the cheapness of the file system vs. other alternatives. It's certainly not shared-nothing - NameNode is single point.  The data awareness that is so nice with Hadoop goes away if the storage itself is virtualized.

He claimed that all of the major web shops (Facebook, Google, LinkedIn, Zynga, Ebay) had built around shared nothing architectures.  

Is this a conflict?  Or just a change in perception depending on how you are looking at the problem?

For these very simple databases, where the file system starts and the database begins is quite confusing. Oracle's Big Data Appliance virtualizes storage then puts HDFS on top of that.  Or you use some other block storage (example - Hadoop can use S3 instead of HDFS.)

Right now none of the NoSQL players deal with data skewing, automatically rebalance and nary a query planner is in sight, so it doesn't matter much now.  But a statistics based optimizer for rebalancing the file/compute balance across these systems is in order.  And the result will look as much like a distributed file system as a database...

PrintView Printer Friendly Version