Snowflake vs Netezza
Fifteen years ago, IBM introduced an appliance-based, on-prem analytics solution known as Netezza. It was purpose built, load ready, and met a lot of the needs of the day (back when on-prem was still largely the preferred choice for data warehousing solutions). One could say IBM really hit the ball out of the park, and Netezza has definitely enjoyed a good, solid run since then, but that was fifteen years ago, and times have changed quite a bit.
While there's no questioning the impact their solution provided at the time was significant, technology doesn't really tend to stay in the same place for very long, so - at the risk of sounding like Captain Obvious - it's worth getting to the heart of what genuinely constitutes a game-changer (to the same degree, if not more) today.
Today, more and more people are catching on to the incredible benefits to be found with a completely cloud based data warehousing solution. Today that solution is known as Snowflake...
Built from the ground up for the cloud, it's a Software as a Service eliminating all of the headaches users of traditional data warehouses now realize are no longer necessary. Headaches come in many shapes and sizes, even when the legacy solution came like a God-send upon delivery. It's only a matter of time until certain limitations begin revealing themselves.
With Netezza, while it nailed the needs for many companies at the time, there were several issues encountered nonetheless:
- Data Tuning - Netezza's Cluster Based Tables introduced the (administrative overhead) requirement of organizing and grooming the tables manually; distributing the data to co-locate it among the other disks.
- Backups - On-prem solutions of any kind introduce additional administrative overhead in the form of backups (incremental ones alone taking hours in most cases, and often an entire weekend day for a full one).
- Scalability - With Netezza, the only way to scale-out is to get another appliance, and this locks you into paying for what is likely way more storage or compute than what is required; the two are not mutually exclusive, nor expandable, independent of each other.
- Support - End of support for all Netezza legacy appliances is scheduled for June 20, 2019, and while their newer Integrated Analytics System (IAS) is touted as "cloud ready" (offering the choice to work with either on-prem, private, or public based storage systems), there still remains all of the administrative overheads that Snowflake eliminates entirely. Also, for the support still available until then (and beyond, with IAS), the appliances themselves are built upon a highly complicated architecture that only IBM engineers can fully understand and support. This means: replacements can only be done by an IBM engineer; components are not generic nor available elsewhere on the market; many of the systems are IBM patented; and service support costs are considerably high.
- Concurrency - Ranking as what is probably the biggest issue with all legacy data warehouse solutions combined (Netezza being no exception) is how to deal with the all-too-common, unexpected concurrency issues which plague any enterprise experiencing growth, or the impact of multiple departments (and/or partners) querying the same database simultaneously.
With Snowflake, all of the above is handled automatically. Just the savings in overhead expenses alone, for all the administrative maintenance an organization is encumbered with by housing an on-prem solution, is an eye-watering savings to behold.
Snowflake is the only data warehousing solution that is purpose built for the cloud, with an all new cloud data architecture that delivers truly granular scalability, only when it's needed - eliminating the on-prem overhead for local, pervasively administered HDFS systems entirely.
So what does "software driven automated management" actually look like??
It looks something like this:
Okay, but what does an "all new cloud data architecture" consist of?
In the following diagram (see below), the Cloud Services section handles the automated management mentioned above and much, much more, whereas each of those individual "Compute Engines" beneath it (the colored dots representing varying Compute cores, sized according to your needs) provide the means for the elimination of all concurrency issues moving forward.
With Snowflake, Compute and Storage are completely distinct from each other, and you can now very easily spin up an entire separate data warehouse (the Compute Engine components within the diagram) for each of the multiple departments within your organization, whenever necessary.
The choice is always yours as to how many of them your organization requires at any given time. Just establish the parameters and they'll spin up based on your needs (and then spin down, reducing costs) dynamically. Each of your departments, partners, and/or customers, can continue working with the "same version of the truth" data (with access rules parsed accordingly per group and/or user, of course), without interfering with each other and causing issues of concurrency. And since your choice of cloud data (Storage) is now available from either Amazon S3 or Microsoft Azure (with pricing from those providers transferred directly to you), your options for truly inexpensive and cutting-edge cloud storage speeds are no longer limited to the singular (and singularly pricey) option available only from IBM with Netezza (i.e., their own).
Now let's take a look at a real world example of the kind of difference a switch to the cloud with Snowflake, from Netezza, can actually make.
OnDeck® Financial Services was using real-time analytics to transform small business lending. With over 80,000 SMB customers, they were soon faced with the following, serious issues:
- Struggling with MPP (Massive Parallel Processing) scaling
- Challenges with managing distribution keys, data skews, and integration of semi-structured data
- Difficulties supporting multiple internal teams
Scaling for MPP and supporting multiple internal teams all boils down to concurrency, which - from an end-user and operational standpoint - really is the biggest bug bear with all legacy on-prem solutions. Shared Data Architecture allows for multiple compute engine data warehouses to eliminate concurrency for multiple departments so they no longer interfere with each other, even while working with the same data source, simultaneously.
Security with Snowflake is bleeding edge and as granular as you could want, with even the option of securely monetizing your data with data sharehousing. Deep diving into data with dedicated, supported analytical & BI platforms, like Tableau® and Looker®, help you visualize what's happening with incredible precision. The integration of semi-structured data, query-able with industry standard ANSI SQL, is also - for the first time - a default capability with Snowflake, with their introduction of the VARIANT data type:
What impact did the elimination of these challenges have for OnDeck® Financial Services?
As impressive as this transition for OnDeck® proved to be, it's really worth remembering how much of a savings they also enjoyed from the off-loading of all their on-prem administrative expenses and migrating completely to the cloud, where it's all handled automatically. But, again, that's not the only place where they saved themselves a fortune! It's also worth looking at a visual representation of what it actually means to no longer be locked into paying for resources you don't even use, 24 x 7, and, instead, to finally only pay for what you actually do.
Find out more about all the benefits Snowflake has to offer you and your business. Sign up for a free proof of concept!