Data to Value – Part 3

Data to Value helps prioritize data related investments

This is the 3rd article in my 3 part series around Data to Value.  The key takeaway from this series is that we always need to understand the value of our data.  We also need to measure the speed of how fast we can go from data to business value.  C-Level execs and others focused on strategic data initiatives need to utilize Data to Value metrics.  Then we can understand the true value that is derived from our data creation, collection, extraction, transformation, loading, analytics.   Which allows us to invest better in data initiatives for our organizations and ourselves.   Finally, data can only produce true value if it is accurate and of a known quality.

If you want to view my 7 Data to Value trends, I summarize them below but more details and the my initial Data to Value Trends 1 to 4 are in Data to Value – Part 1 and Data to Value – Part 2.

Data to Value Trends – Summary and focus on Trends 5,6,7

Here are the Data to Value Trends that I think you need to be aware of: 

(there are a few others though as well!)

Trend #1 – Non-stop push for faster speed of Data to Value. 

Within our non-stop dominantly capitalist world, faster is better!   Data to Value Speed advantages for organizations especially around improved value chains can create massive business advantages.

Trend #2 – Data Sharing.  See Part 2

Trend #3 – Creating Data with the End in Mind.  See Part 2

Trend #4 – Automated Data Applications.  See Part 2

(We will see how Snowflake integrates Streamlit.  It could be “transformational”)

Trend #5 – Full Automated Data Direct Copy Tools. 

Growth of Fivetran and now Hightouch.

The growth of Fivetran and Stitch (Now Talend) has been amazing.  We now are also seeing huge growth with automated data copy pipelines going the other way focused on the Reverse ETL (Reverse Extraction Transformation and Load) like our partner Hightouch.  At our IT Strategists consulting firm, we became a partner with Stitch, Fivetran, and Matillion back in 2018.  At Snowflake’s Partner Summit back in 2018 I sat next to Jake Stein – one of the founders of Stitch on the bus from San Francisco to the event in Sonoma and we quickly became good friends. (Jake is an excellent entrepreneur and now focused on a new startup Common Paper – a structured contracts platform – after selling Stitch to Talend)  Then I also met George Frazier from Fivetran at the event and mentioned how he was killing it with his post comparing all the cloud databases back in 2018 [there was no other content like that back then].

Resistance of “ease of use” and “cost reductions” is futile.

Part of me as a consultant at the time wanted to resist these “Automated EL Tools” EL (Extract and Load) vs ETL – (Extract, Transform, and Load) or ELT (Extract, Load, and then Transform within the database).  As I tested out Stitch and Fivetran though I knew that resistance was futile.  The ease of use of these tools and the reduction of development and maintenance costs cannot be overlooked.   There was no way to stop the data market from embracing these easier to use data pipeline automation tools.  What was even more compelling is you can setup automated extract and load jobs within minutes or hours most of the time.  This is UNLIKE any of the previous ETL tools we have been using for decades which were mostly software installations.  These installations took capacity planning, procurement, and all sorts of organization business friction to EVEN get started at all.  With Fivetran and Hightouch, there is no engineering or developer expertise needed for almost all of the work.  [in certain situations, it definitely helps to have data engineers and architects involved.]   Overall though, it really is just a simple concept connecting DESTINATIONS and CONNECTORS to each other.  Within Fivetran, DESTINATIONS are databases or data stores.  CONNECTORS are sources of data (Zendesk, Salesforce, or one of the hundreds of other connectors in Fivetran).  Fivetran and Hightouch are excellent examples of a data service/tool trends that truly improve the speed of Data to Value.

Trend #6 – Full Automation of Data Pipelines and more integrated ML and Data Pipelines. 

With the introduction of a fully automated data object and pipeline service at Coalesce, we saw for the first time for data professionals to improve Data to Value through fully automated data objects and pipelines.  Some of our customers are referring to parts of Coalesce as a Terraform-like product for data engineering.  What I personally see is again massive removal of data engineering friction similar to what Fivetran and Hightouch did but at a separate area of the data processing stack.  We have become an early partner with Coalesce because we think it is similar to how we viewed Snowflake at the beginning of 2018.   We view Coalesce as just making Snowflake even more amazing to use.

Trend #7 – The Data Mesh Concept(s), Data Observability, etc. concepts. 

Love these concepts or hate them, understand them or misunderstand them, they are taking hold within the overall data professionals’ braintrust.  Zhamak Dehghani (previously at Thoughtworks) and Thoughtworks from 2019 until now has succeeded in communicating to the market the concept of a Data Mesh.  Whereas, Barr Moses from Monte Carlo, has been beating the drum very hard on the concept of Data Observability.   I’m highlighting these data concepts as trends which are aligned to improving Data to Value speed, quality, and accessibility.  There are many more data concepts besides these two.  Time will reveal which of these will gain mind and market share and which will go by the wayside.

Also, a MAJOR MAJOR trend that has been happening for a quite awhile “trying” to push the needle forward with data to value has been the growth of automated integrated Machine Learning pipelines with data.  This is what Data Robot, Dataiku, H2O, Sagemaker, and tons and tons of others are attempting to do.  It still seems very very early stage and not any single vendor with large mindshare or adoption yet.  Overall the space is a fragmented right now and its hard to tell which of these tools and vendors will thrive and survive.

*What data to value trends am I missing?  I put the top ones I see but hit me up in the comments or directly if you have additional trends.

Finally, I’ll recap the same exact Snowflake Announcements where I see Snowflake Product actually focused on Data to Value.  [if you read the entire Part 1 or Part 2 then this content is currently exactly the same. ]

IF YOU READ either Part 1 or Part 2 of Data to Value.  These are currently exactly the same as last week. [ I just copied it here for new readers.]

Snowflake’s Announcements related to Data to Value (past several months)

Snowflake is making massive investments and strides to continue to push Data to Value.  Their announcements earlier this year at Snowflake Summit have Data to Value feature announcements such as:

*Snowflake’s support of Hybrid Tables and announcement of the concept of Unistore – The move into some type of OLTP (Online Transaction Processing).  Organizations are looking for this single source of truth and reduce latency of Data to Value from OLAP to OLTP type apps with Hybrid tables.

*Snowflake’s Native Apps announcements.  If Snowflake can get this right its a game changer for Data to Value and decreasing costs of deployment of Data Applications. 

*Streamlit integration into Snowflake.  Again, if Snowflake gets this right then it could be another Data to Value game-changer.  

***Also note, these 2 items above also make the development of data apps and the combination of OLTP/OLAP applications: much less costly and more achievable for “all” sizes of companies.  They could remove massive friction that exists with having to have massive high end full stack development.  Streamlit really is attempting to remove the Front-End and Middle Tier complexity from developing data applications.  (Aren’t most applications though data applications?).  Its really another low-code data development environment.

*Snowpipe streaming announcement.  (This was super interesting to me since I had worked with Issaic from Snowflake back before the 2019 Summit using the original Kafka to Snowflake Connector.  I also did a presentation on it at Snowflake Summit 2019.  It was awesome to see that Snowflake refactored the old Kafka connector and made it much faster with lower latency.  This again is another major win around Steaming Data to Value with an announced 10 times lower latency.  (Public Preview later in 2022)

*Snowpark for Python, Snowpark in general announcements.  This is really really new tech and the verdict is still out there but this is a major attempt by Snowflake to provide ML Pipeline Data to Value speed.  Snowflake is looking to have the full data event processing and Machine Learning processes all within Snowflake.

Summary

This article is part of my Frank’s Future of Data series I put together to prepare myself for taking advantage of new paradigms that the “Snowflake Data Cloud” and other “Modern Data Stack” tools/clouds provide.  Before I started my Snowflake Journey I was often speaking around the intersection of Data, Automation, and AI/ML.  I truly believe these forces have been changing our world everywhere and will continue to do so for many years. 

Data to Value is a key concept that helps us prioritize how to invest in our data related initiatives.

I hope you found this useful for thinking about how you should decide on data related investments and initiatives.  Focusing specifically on Data to Value can help you prioritize and simplify what is truly most important for your organization!  Good Luck to you all!

Leave a Reply

Snowflake Cost Saving

we automate snowflakeDB data cloud cost saving. sign our free 7 days no risk trail now