|
| 1 | + |
| 2 | +# Virtualize Delta Table file with PolyBase |
| 3 | + [!INCLUDE [SQL Server 2022](../../includes/applies-to-version/sqlserver2022.md)] and later |
| 4 | + |
| 5 | +[!INCLUDE[sssql22-md](../../includes/sssql22-md.md)] can query data directly from a Delta table folder. This concept, commonly referred to as data virtualization, allows the data to stay in its original location, but can be queried from a SQL Server instance with T-SQL commands like any other table. This feature uses PolyBase connectors, and minimizes the need for copying data via ETL processes. |
| 6 | + |
| 7 | +In the example below, the Delta table folder file is stored on Azure Blob Storage and accessed via OPENROWQUERY or an external table. |
| 8 | + |
| 9 | +For more information on data virtualization, [Introducing data virtualization with PolyBase](polybase-guide.md). |
| 10 | + |
| 11 | +## Pre-configuration |
| 12 | + |
| 13 | +### 1. Enable PolyBase in `sp_configure` |
| 14 | + |
| 15 | +```sql |
| 16 | +exec sp_configure @configname = 'polybase enabled', @configvalue = 1; |
| 17 | + |
| 18 | +RECONFIGURE; |
| 19 | +``` |
| 20 | + |
| 21 | +### 2. Create a user database |
| 22 | + |
| 23 | +This exercise creates a sample database with default settings and location. You'll use this empty sample database to work with the data and store the scoped credential. In this example, a new empty database named `Delta_demo` will be used. |
| 24 | + |
| 25 | +```sql |
| 26 | +CREATE DATABASE [Delta_demo]; |
| 27 | +``` |
| 28 | + |
| 29 | +### 3. Create a master key and database scoped credential |
| 30 | + |
| 31 | +The database master key in the user database is required to encrypt the database scoped credential secret, `delta_storage_dsc`. For this example the Delta table resides on Azure Data Lake Storage Gen2. |
| 32 | + |
| 33 | +```sql |
| 34 | +USE [Delta_demo]; |
| 35 | +CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password'; |
| 36 | +``` |
| 37 | + |
| 38 | +```sql |
| 39 | +CREATE DATABASE SCOPED CREDENTIAL delta_storage_dsc |
| 40 | +WITH IDENTITY = 'SHARED ACCESS SIGNATURE', |
| 41 | +SECRET = '<SAS Token>'; |
| 42 | +``` |
| 43 | + |
| 44 | +### 4. Create external data source |
| 45 | + |
| 46 | +Database scoped credential will be used for the external data source. In this example, the Delta Table resides in Azure Data Lake Storage Gen2, so use prefix `adls` and the `SHARED ACCESS SIGNATURE` identity method. For more information about the connectors and prefixes, including new settings for [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)], refer to [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md?view=sql-server-ver16&preserve-view=true#location--prefixpathport-3). |
| 47 | + |
| 48 | +```sql |
| 49 | +CREATE EXTERNAL DATA SOURCE Delta_ED |
| 50 | +WITH |
| 51 | +( |
| 52 | + LOCATION = 'adls://<container>@<storage_account>.dfs.core.windows.net' |
| 53 | +,CREDENTIAL = delta_storage_dsc |
| 54 | +); |
| 55 | +``` |
| 56 | + |
| 57 | +For example, if your storage account is named `delta_lake_sample` and the container is named `sink`, the code would be: |
| 58 | + |
| 59 | +```sql |
| 60 | +CREATE EXTERNAL DATA SOURCE Delta_ED |
| 61 | +WITH |
| 62 | +( |
| 63 | + LOCATION = 'abs://sink@delta_lake_sample.dfs.core.windows.net' |
| 64 | +,CREDENTIAL = delta_storage_dsc |
| 65 | +) |
| 66 | +``` |
| 67 | + |
| 68 | +## Use OPENROWQUERY to access the data |
| 69 | + |
| 70 | +In this example the Data Table folder is named `Rockstar`. |
| 71 | + |
| 72 | +Since the external data source `Delta_ED` is mapped to a container level. The `Rockstar` Delta table folder is located in a root. To query a file in a folder structure, provide a folder mapping relative to the external data source's LOCATION parameter. |
| 73 | + |
| 74 | +```sql |
| 75 | +SELECT * |
| 76 | +FROM OPENROWSET |
| 77 | + ( BULK '/Rockstar' |
| 78 | + , FORMAT = 'DELTA' |
| 79 | + , DATA_SOURCE = 'Delta_ED' |
| 80 | + ) as [result] |
| 81 | +``` |
| 82 | + |
| 83 | +## Query data with an external table |
| 84 | + |
| 85 | +CREATE EXTERNAL TABLE can also be used to virtualize the Delta table data in SQL Server. The columns must be defined and strongly typed. While external tables take more effort to create, they also provide additional benefits over querying an external data source with OPENROWSET. You can: |
| 86 | + |
| 87 | +1. Strengthen the definition of the data typing for a given column. |
| 88 | +2. Define nullability. |
| 89 | +3. Define COLLATION. |
| 90 | +4. Create statistics for a column to optimize the quality of the query plan. |
| 91 | +5. Create a more granular model within SQL Server for data access to enhance your security model. |
| 92 | + |
| 93 | +For more information, see [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md). |
| 94 | + |
| 95 | +For the following example, the same data source will be used. |
| 96 | + |
| 97 | +### 1. Create external file format |
| 98 | + |
| 99 | +To define the file's formatting, an external file format is required. External file formats are also recommended due to reusability. For more information, see [CREATE EXTERNAL FILE FORMAT](../../t-sql/statements/create-external-file-format-transact-sql.md). |
| 100 | + |
| 101 | + |
| 102 | +```sql |
| 103 | +CREATE EXTERNAL FILE FORMAT DeltaTableFormat WITH(FORMAT_TYPE = DELTA); |
| 104 | +``` |
| 105 | + |
| 106 | +### 2. Create external table |
| 107 | + |
| 108 | +The Delta table files are located at `/delta/Delta_yob/` and the external data source for this example, is an S3-Compliant Object Storage, previously configured under the data source `s3_eds`. Polybase can use the as LOCATION the Delta table folder or the absolute file itself, which would be located at `delta/Delta_yob/_delta_log/00000000000000000000.json`. |
| 109 | + |
| 110 | +```sql |
| 111 | +-- Create External Table using Delta |
| 112 | +CREATE EXTERNAL TABLE ext_bandmember |
| 113 | +( id int, |
| 114 | + name VARCHAR(200), |
| 115 | + dob date |
| 116 | +)WITH |
| 117 | +( LOCATION = '/delta/Delta_yob/' |
| 118 | + , FILE_FORMAT = DeltaTableFormat |
| 119 | + , DATA_SOURCE = s3_eds |
| 120 | +) |
| 121 | +GO |
| 122 | +``` |
| 123 | + |
| 124 | +## Next steps |
| 125 | + |
| 126 | +For more tutorials on creating external data sources and external tables to a variety of data sources, see [PolyBase Transact-SQL reference](polybase-t-sql-objects.md). |
| 127 | + |
| 128 | +- [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md) |
| 129 | +- [CREATE EXTERNAL FILE FORMAT](../../t-sql/statements/create-external-file-format-transact-sql.md) |
| 130 | +- [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md) |
0 commit comments