Snowflake's Data Cloud operates on a sophisticated data platform offered as a self-managed service. It empowers faster, more user-friendly, and highly flexible data storage, processing, and analytic solutions compared to traditional alternatives.
Unlike other existing database technologies or "big data" software platforms like Hadoop, Snowflake doesn't rely on pre-existing frameworks. Instead, it integrates a groundbreaking SQL query engine with a cloud-native architecture, specifically designed for efficiency. To end-users, Snowflake offers the complete functionality of an enterprise analytic database, coupled with numerous additional special features and distinctive capabilities.
Data Platform as a Self-managed Service:
Snowflake operates as a fully self-managed service, which implies:
- No hardware (virtual or physical) needs to be selected, installed, configured, or managed.
- Virtually no software requires installation, configuration, or management on the user's part.
- Continuous maintenance, management, upgrades, and tuning are seamlessly handled by Snowflake.
The entirety of Snowflake's service operates on cloud infrastructure, with all components—excluding optional command line clients, drivers, and connectors—running within public cloud infrastructures. Snowflake relies on virtual compute instances for computation and a storage service for the persistent storage of data. It is not designed for operation on private cloud infrastructures, whether on-premises or hosted.
Snowflake stands apart from traditional packaged software offerings, as users are not responsible for software installation or updates; Snowflake manages all aspects of these processes.
Snowflake Architecture:
Snowflake's architecture seamlessly blends elements of traditional shared-disk and shared-nothing database architectures. In alignment with shared-disk architectures, Snowflake employs a central data repository where persisted data is accessible from all compute nodes within the platform. However, akin to shared-nothing architectures, Snowflake executes queries through MPP (massively parallel processing) compute clusters. In this configuration, each node in the cluster locally stores a segment of the complete dataset. This innovative approach provides the data management simplicity characteristic of shared-disk architectures while delivering the performance and scale-out advantages associated with shared-nothing architectures.
Snowflake's distinctive architecture comprises three fundamental layers:
Database Storage
Query Processing
Cloud Services
Database Storage:
Upon loading data into Snowflake, the platform systematically restructures the data into an internally optimized, compressed, and columnar format. This optimized data is then stored in cloud storage.
Snowflake takes charge of every aspect of data storage, encompassing organization, file size, structure, compression, metadata, statistics, and other pertinent elements. The data objects stored by Snowflake remain discreet and are not directly visible or accessible to customers. Access to this stored data is exclusively facilitated through SQL query operations conducted using Snowflake.
Query Processing:
Queries are processed in the execution layer using "virtual warehouses," which are MPP compute clusters comprised of multiple nodes allocated by Snowflake. Each virtual warehouse is independent, avoiding any impact on the performance of others. Refer to the documentation on Virtual Warehouses for more details.
Cloud Services:
The cloud services layer orchestrates activities across Snowflake, managing authentication, infrastructure, metadata, query optimization, and access control. These services run on compute instances provisioned by Snowflake from the cloud provider.
Connecting to Snowflake:
Snowflake offers various connection methods:
- Web-based User Interface: Access all aspects of Snowflake management and usage.
- Command Line Clients (e.g., SnowSQL): Comprehensive access to Snowflake management and usage.
- ODBC and JDBC Drivers: Enable other applications (e.g., Tableau) to connect to Snowflake.
- Native Connectors (e.g., Python, Spark): Develop applications connecting to Snowflake.
- Third-party Connectors: Link applications like ETL tools (e.g., Informatica) and BI tools (e.g., ThoughtSpot) to Snowflake.