Home - Hillsborough, CA

Friday

Mar022012

Two sides of big data: Better Answers vs. Cheaper Answers

Friday, March 2, 2012 at 12:24PM

One of the most frequently uttered thoughts at Strata 2012 "This is nothing new - folks have been doing this for years." Yet something new and creative is going on. There is certainly a lot of fluff but I see three major legitimate trends going on that overlap – but in a fairly confusing manner.

Click to read more ...

Aaron Rosenbaum | Comments Off |

Thursday

Dec082011

Too many management consoles in NoSQL/NewSQL DBMS industry!

Thursday, December 8, 2011 at 10:37AM

The database industry used to be more monolothic. Now many enteprises have a hodge-podge of pieces to their data pipeline. Why is it that each company is coming out with their own management console front-end. We've had recently, Cloudera, Greenplum, MongoDB, Cassandra all show off their snazzy new management consoles. Only MarkLogic seems to have taken the integrate instead of build route (OpenView + Nagios I think.)

If there are 6-8 things touching the data in a pipeline, shouldn't they be reporting up to a central console? Do we really need another central console technology? Look to the networking industry for guidance here....

Next up, layer cakes of file systems and social networking portols - everyone seems to want their own one of those too....

Aaron Rosenbaum | Comments Off |

Data Management,

Storage

Monday

Dec052011

Federated Queries - enabling a Cloudy future?

Monday, December 5, 2011 at 10:04PM

I had a very enjoyable dinner last night hosted by Ken Oestreich, cloud-marketing god at EMC. Jeff Nick, CTO of EMC was there along with Lew Tucker, Cloud CTO, Cisco, Rodrigo Flores, founder of NewScale, Tim Crawford, CIO of AllCovered, Glenn Donithan, from Exceptional SW strategies and Bernard Golden, CEO HyperStratus and cloud blogger for CIO.com.

We talked about a variety of topics in the cloud world. I'll let the cloud twiterati write about those. But, we keep all those boxes running cloud systems in places called "data" centers for a reason - and eventually the conversation came to data.

Tim Crawford asked Jeff Nick - "What are the challenges that EMC faces that you don't know the answers or path for?” He thought for quite a while. It was nice to see a considered response. He saw resource sharing - Amazon AWS, server virtualizaton, etc - as a temporary/small part of the promise of cloud computing. The real value comes when data flows from organization to organization and each can work on adding their unique value to the overall value. There is so much more data outside the organizations four walls than inside - it cannot all be brought it.

This brought me back to Cohera right away - the federate query has value after 10 years (now? Or in 5 years?) In today’s architecture, what do federated query systems look like?

I'd maintain that you still need a declarative language for the queries. You need to be able to query catalogs/metadata. Organizations need to have control over provisioning for query response but also pub/sub streams - rates, obscucation, use rules, even billing. Caching rules would also apply. It must be over a RESTful interface...The future is closer than it may appear.

Aaron Rosenbaum | Comments Off |

Cloud/Virtualization,

NoSQL

Wednesday

Oct192011

The Stonebraker Uncertainty Principle

Wednesday, October 19, 2011 at 04:04PM

Today at the XLDB conference at the Stanford Linear Accellerator Center, Mike Stonebraker gave a very nice talk concerning Shared disk vs. Shared Nothing architectures. I won't rehash the basic's here - Mike has been writing about this for a long time.

Several folks sitting around me commented on an inconsistency in his talk:

Hadoop is a shared file system with a computation dispatch mechanism (that can route on data location which is very nice for certain problems.) Without a distributed file system, there is no Hadoop, right? What is unique about HDFS is the cheapness of the file system vs. other alternatives. It's certainly not shared-nothing - NameNode is single point. The data awareness that is so nice with Hadoop goes away if the storage itself is virtualized.

He claimed that all of the major web shops (Facebook, Google, LinkedIn, Zynga, Ebay) had built around shared nothing architectures.

Is this a conflict? Or just a change in perception depending on how you are looking at the problem?

For these very simple databases, where the file system starts and the database begins is quite confusing. Oracle's Big Data Appliance virtualizes storage then puts HDFS on top of that. Or you use some other block storage (example - Hadoop can use S3 instead of HDFS.)

Right now none of the NoSQL players deal with data skewing, automatically rebalance and nary a query planner is in sight, so it doesn't matter much now. But a statistics based optimizer for rebalancing the file/compute balance across these systems is in order. And the result will look as much like a distributed file system as a database...

Aaron Rosenbaum | Comments Off |

Cloud/Virtualization,

Data Management,

NoSQL,

Storage

Monday

Oct172011