Welcome to our part 2 Data to Value series. If you’ve read Part 1 of the Data to Value Series, you’ve learned about some of the trends happening within the data space industry as a whole.
In Part 2 of the Data to Value series, we’ll explore additional trends to consider, as well as some of Snowflake’s announcements in relation to Data to Value.
As a refresher on this series, we are making a fundamental point that data professionals and data users of all types need to be focused not just on creating, collecting, and transforming data. We need to make a cognizant effort to focus on and measure what is the true value that each set of data creates. Also, we need to measure, how fast we can get to that value if it provides any real business advantages. There is an argument to also alter the value of the data that is time-dependent since it loses value sometimes the older it is.
Data to Value Trends – Part 2:
8) – Growth of Fivetran and now Hightouch.
The growth and success of Fivetran and Stitch (now Talend) has been remarkable. There is now a significant surge in the popularity of automated data copy pipelines that work in the reverse direction, with a focus on Reverse ETL (Reverse Extraction Transformation and Load), much like our trusted partner, Hightouch. Our IT Strategists consulting firm became partners with Stitch, Fivetran, and Matillion in 2018.
At the Snowflake Partner Summit of the same year, I had the pleasure of sitting next to Jake Stein, one of the founders of Stitch, on the bus from San Francisco to Sonoma. We quickly became friends, and I was impressed by his entrepreneurial spirit. Jake has since moved on to a new startup, Common Paper, a structured contracts platform, after selling Stitch to Talend. At the same event, I also had the opportunity to meet George Frazier from Fivetran, who impressed me with his post comparing all the cloud databases back in 2018. At that time, such content was scarce.
9) – Resistance to “ease of use” and “cost reductions” is futile.
Part of me as a consultant at the time wanted to resist these “Automated EL Tools” EL (Extract and Load) vs ETL – (Extract, Transform, and Load) or ELT (Extract, Load, and then Transform within the database). As I tested out Stitch and Fivetran though, I knew that resistance was futile. The ease of use of these tools and the reduction of development and maintenance costs cannot be overlooked. There was no way to stop the data market from embracing these easier-to-use data pipeline automation tools.
What was even more compelling is you can set up automated extract and load jobs within minutes or hours most of the time. This is unlike any of the previous ETL tools we have been using for decades which were mostly software installations. These installations took capacity planning, procurement, and all sorts of organizational business friction to even get started at all. With Fivetran and Hightouch, there is no engineering or developer expertise needed for almost all of the work. In some cases, it can be beneficial to have the expertise of data engineers and architects involved.
Overall, the concept is simple: connecting destinations and connectors to facilitate Fivetran. Destinations refer to databases or data stores. Connectors are sources of data, such as Zendesk, Salesforce, or one of the many other connectors in Fivetran. Fivetran and Hightouch are great examples of trends in data services and tools that really speed up the process of getting value from your data.
10) – Growth of Automated and Integrated Machine Learning Pipelines with Data.
Many companies, including Data Robot, Dataiku, H2O, and Sagemaker, are working to achieve this goal. However, this field appears to be in its early stages, with no single vendor having gained widespread adoption or mindshare. Currently, the market is fragmented, and it is difficult to predict which of these tools and vendors will succeed in the long run.
Snowflake’s Announcements related to Data to Value
Snowflake is making significant investments and progress in the field of data analysis, with a focus on delivering value to its clients. Their recent announcements at the Snowflake Summit this year, as detailed in this source, highlight new features that are designed to enhance the Data to Value experience.
Snowflake recently announced its support of Hybrid Tables and the concept of Unistore.
This move is aimed at providing Online Transaction Processing (OLTP) to its customers. There has been great interest from customers in this concept, which allows for a single source of truth through web-based OLTP-type applications operating on Snowflake with Hybrid tables.
Announcements about Snowflake’s Native Apps:
- Integrating Streamlit into Snowflake.
If done correctly, this could be yet another game-changer in turning data into value.
Please note that these two items mentioned not only enable data to be processed more quickly, but also significantly reduce the cost and complexity of developing data apps and combining OLTP/OLAP applications. This removes many of the barriers that come with requiring expensive, full-stack development. Streamlit aims to simplify the development of data applications by removing the complexity of the front-end and middle-tier components. (After all, aren’t most applications data-driven?) It is yet another low-code data development environment.)
- Announcement of Snowpipe streamlining.
I found this particularly fascinating, as I had collaborated with Isaaic from Snowflake before the 2019 Summit using the original Kafka to Snowflake Connector. At Snowflake Summit 2019, I also gave a presentation on the topic. It was truly amazing to witness Snowflake refactor the old Kafka connector. As a result, there were significant improvements in speed and lower latency. This is yet another major victory for streamlining data to improve value, with an anticipated 10 times lower latency. The public preview is slated for later in 2022.
- Announcement: Snowpark for Python and Snowpark in General
Snowflake has recently introduced a new technology called Snowpark. While the verdict is still out on this new technology, it represents a major attempt by Snowflake to provide ML pipeline data with increased speed. Snowflake is looking to integrate full data event processing and machine learning processes within Snowflake itself.
If Snowflake can execute this correctly, it will revolutionize how we approach data value. Additionally, it reduces the costs associated with deploying data applications.
In part 2 of the “Data to Value” series, we explored additional trends in the data industry, including the growth of automated data copy pipelines and integrated machine learning pipelines. We also discuss Snowflake’s recent announcements related to data analysis and delivering value to clients, including support for hybrid tables and native apps. The key takeaway is the importance of understanding the value of data and measuring the speed of going from data to business value.
Executives and others who prioritize strategic data initiatives should make use of Data to Value metrics. This helps us comprehend the actual value that stems from our data creation, collection, extraction, transformation, loading, and analytics. By doing so, we can make better investments in data initiatives for our organizations and ourselves. Ultimately, data can only generate genuine value if it is reliable and of confirmed quality.