Recent Tweets
join our mailing list
* indicates required

About Ambleside

Ambleside Logic is led by Aaron Rosenbaum. Father of 3, Programming since 7, DevOps since 11 (hacking RSTS), exIngres, exCTP, exCohera. Sold two companies to Oracle, one to HP. Research + Strategy for NoSQL/BigData ecosystem implementors, vendors and investors.

« Old SQL vs. NewSQL - Stonebraker August 24th, 2011 | Main | Real-time analytics »
Sunday
Aug212011

Horizontal scale-out and Data Warehousing

Horizontal scale-out with shared-nothing architecture cheapest way to scale

Many of the specialty databases used in data warehousing have had shared-nothing architectures. It is the way to achieve greatest scalability to handle data and query volume. Shared-nothing architectures scale over commodity hardware, rather than relying on extremely expensive scaled-up disk/server systems, is another way they scale - by being possible to afford the hardware necessary to scale.

Leaders of most expensive solutions leading acquirers of scale-out DW technologies.

Not surprisingly, many of the leading products in the field have been acquired by firms who primarily sell hardware as a way to benefit from commodity hardware - they package their software + commodity hardware into profitable appliances as a way to mitigate against the loss of expensive disk appliance sales.  Teradata, EMC (Greenplum), HP (Vertica), Oracle (Exadata) have all taken this approach.

Actually letting the customer achieve the benefit of the drop in hardware costs to achieve scaling has really been limited to the private and open-source community.  MarkLogic, HBase and Voldemort Project fit this description.  HBase has been solidly proven to scale out massively but performance is sometimes slow.  MarkLogic has been commercially proven but has acquisition costs much higher than $0, albeit much cheaper than comporable solutions from Oracle, EMC, etc.  Voldemort is interesting but has had minimal commercial deployment.

Same thing happens in storage

This same dynamic is playing out in the storage side - each storage vendor has made multiple acqusitions of start-ups who are eroding their high-end base.  After acqusition, few of these companies are able to focus on the truly massive scale storage applications - they get pigeonholed into "storage for beginners" within these large companies.  It has certainly happened to Isilon (EMC).  Network Appliance has shown no real indication that Engenio would replace FAS - it's an entry into lower end block storage systems.  The vendors will naturally tier these.  You must have an overall account plan - there is little to push these companies to make horizontal scale-out solutions work as well as their scaled-up versions.  CorAid is a good example of someone still independent.

Something is going to break

We've seen recent database/dw/storage dependent businesses go the way of in-house development, eschewing most of the vendors mentioned above. From Facebook to LinkedIn, many of these organizations have built their own infrastructure software instead of licensing it.  Is this a silicon valley one-off or a trend?  Time will tell but there seems to be little uptake of this approach in the F1000 even with their sucesses.  Will the large organizations cannabalize themselves more quickly than the indepenents? It depends on the willingness of investors to see the future ahead of some quarters of revenue.  The recent massive drop in HP's stock price upon announcing a move from lower-value to higher-value business seems to say "No" - investors do not have the stomach for giving up current but declining profits in exchange for better success at new markets.  Where does that leave projects today? There are a few independent software vendors who have the size to deliver on mission critical enterprise projects that aren't tied to you buying a lot of expensive hardware.  Customers with lots and lots of data had better make sure they succeed otherwise the only choices they are going to have are among which expensive hardware provider will they buy their DW systems from.

Who to look at now?

There are 4 major systems today that support shared-nothing, transactional, unstructured text over massively scalable architectures - Netezza (IBM), Sybase IQ (SAP), MarkLogic (Private) and HBase (part of Hadoop).  HBase is nortoriously slow - most of the commercial hadoop packages have replacements for it.  Sybase IQ has been very inward facing since acquisition - it's hard to tell what they are doing outside of SAP related business.  NetTezza fits well into IBM's hardware salesforce.  Netezza's 3 GB, $100K Skimmer unit seems very reminiscent of an AS/400.  Software bigdata solutions can be brought in at 10% of the cost, or less, than proprietary or relational system pricing.  HBase has traditionally been slow but replacements are being brought out from a number of companies to increase it's speed.  MarkLogic has already demonstrated developer producitivy gains and performance advantages over many of their competitors but remains quiet in much of the BI market. They certainly are worth a call before spending another $10M on a big storage system.

 

PrintView Printer Friendly Version

References (4)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.
Member Account Required
You must have a member account on this website in order to post comments. Log in to your account to enable posting.