Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Commit 3bcb342

Browse files
HugoMSFTWilliamDAssafMSFT
authored andcommitted
20220720 01 delta
1 parent 4c13c97 commit 3bcb342

3 files changed

Lines changed: 156 additions & 2 deletions

File tree

docs/relational-databases/polybase/polybase-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@ PolyBase provides these same functionalities for the following SQL products from
5656
| S3-compatible object storage | [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)] adds new connector, S3-compatible object storage, using the S3 REST API. You can use both `OPENROWSET` and `EXTERNAL TABLES` to query data files in S3 compatible object storage. |
5757
| Some connectors separate from PolyBase services | The S3-compatible object storage connector, as well as ADSL Gen2, and Azure Blob Storage, are no longer dependent of PolyBase services. PolyBase services must still run to support connectivity with Oracle, Teradata, MongoDB, and Generic ODBC. The PolyBase feature must still be installed on your SQL Server instance. |
5858
| Parquet file format | PolyBase is now capable of querying data from Parquet files stored on S3-compatible object storage. For more information, see to [Virtualize parquet file in a S3-compatible object storage with PolyBase](polybase-virtualize-parquet-file.md). |
59+
|Delta Table format | Polybase is now capable of querying data from Delta Table format stored on S3-compatible object storage, Azure Storage Account V2, and Azure Data Lake Storage Gen2. For more information, see to [Virtualize Delta Table format](virtualize-delta.md)|
60+
|Create External Table as Select (CETAS) | Polybase can now use CETAS to creates an external table and then exports, in parallel, the result of a [!INCLUDE[tsql](../../includes/tsql-md.md)] SELECT statement to Azure Data Lake Storage Gen2, Azure Storage Account V2, and S3-compliant object storage. For more information, see [CREATE EXTERNAL TABLE AS SELECT](../../t-sql/statements/create-external-table-as-select-sql.md)|
5961

6062
For more new features of [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)], see [What's new in SQL Server 2022?](../../sql-server/what-s-new-in-sql-server-2022.md)
6163

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
2+
# Virtualize Delta Table file with PolyBase
3+
[!INCLUDE [SQL Server 2022](../../includes/applies-to-version/sqlserver2022.md)] and later
4+
5+
[!INCLUDE[sssql22-md](../../includes/sssql22-md.md)] can query data directly from a Delta table folder. This concept, commonly referred to as data virtualization, allows the data to stay in its original location, but can be queried from a SQL Server instance with T-SQL commands like any other table. This feature uses PolyBase connectors, and minimizes the need for copying data via ETL processes.
6+
7+
In the example below, the Delta table folder file is stored on Azure Blob Storage and accessed via OPENROWQUERY or an external table.
8+
9+
For more information on data virtualization, [Introducing data virtualization with PolyBase](polybase-guide.md).
10+
11+
## Pre-configuration
12+
13+
### 1. Enable PolyBase in `sp_configure`
14+
15+
```sql
16+
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
17+
18+
RECONFIGURE;
19+
```
20+
21+
### 2. Create a user database
22+
23+
This exercise creates a sample database with default settings and location. You'll use this empty sample database to work with the data and store the scoped credential. In this example, a new empty database named `Delta_demo` will be used.
24+
25+
```sql
26+
CREATE DATABASE [Delta_demo];
27+
```
28+
29+
### 3. Create a master key and database scoped credential
30+
31+
The database master key in the user database is required to encrypt the database scoped credential secret, `delta_storage_dsc`. For this example the Delta table resides on Azure Data Lake Storage Gen2.
32+
33+
```sql
34+
USE [Delta_demo];
35+
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password';
36+
```
37+
38+
```sql
39+
CREATE DATABASE SCOPED CREDENTIAL delta_storage_dsc
40+
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
41+
SECRET = '<SAS Token>';
42+
```
43+
44+
### 4. Create external data source
45+
46+
Database scoped credential will be used for the external data source. In this example, the Delta Table resides in Azure Data Lake Storage Gen2, so use prefix `adls` and the `SHARED ACCESS SIGNATURE` identity method. For more information about the connectors and prefixes, including new settings for [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)], refer to [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md?view=sql-server-ver16&preserve-view=true#location--prefixpathport-3).
47+
48+
```sql
49+
CREATE EXTERNAL DATA SOURCE Delta_ED
50+
WITH
51+
(
52+
LOCATION = 'adls://<container>@<storage_account>.dfs.core.windows.net'
53+
,CREDENTIAL = delta_storage_dsc
54+
);
55+
```
56+
57+
For example, if your storage account is named `delta_lake_sample` and the container is named `sink`, the code would be:
58+
59+
```sql
60+
CREATE EXTERNAL DATA SOURCE Delta_ED
61+
WITH
62+
(
63+
LOCATION = 'abs://sink@delta_lake_sample.dfs.core.windows.net'
64+
,CREDENTIAL = delta_storage_dsc
65+
)
66+
```
67+
68+
## Use OPENROWQUERY to access the data
69+
70+
In this example the Data Table folder is named `Rockstar`.
71+
72+
Since the external data source `Delta_ED` is mapped to a container level. The `Rockstar` Delta table folder is located in a root. To query a file in a folder structure, provide a folder mapping relative to the external data source's LOCATION parameter.
73+
74+
```sql
75+
SELECT *
76+
FROM OPENROWSET
77+
( BULK '/Rockstar'
78+
, FORMAT = 'DELTA'
79+
, DATA_SOURCE = 'Delta_ED'
80+
) as [result]
81+
```
82+
83+
## Query data with an external table
84+
85+
CREATE EXTERNAL TABLE can also be used to virtualize the Delta table data in SQL Server. The columns must be defined and strongly typed. While external tables take more effort to create, they also provide additional benefits over querying an external data source with OPENROWSET. You can:
86+
87+
1. Strengthen the definition of the data typing for a given column.
88+
2. Define nullability.
89+
3. Define COLLATION.
90+
4. Create statistics for a column to optimize the quality of the query plan.
91+
5. Create a more granular model within SQL Server for data access to enhance your security model.
92+
93+
For more information, see [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md).
94+
95+
For the following example, the same data source will be used.
96+
97+
### 1. Create external file format
98+
99+
To define the file's formatting, an external file format is required. External file formats are also recommended due to reusability. For more information, see [CREATE EXTERNAL FILE FORMAT](../../t-sql/statements/create-external-file-format-transact-sql.md).
100+
101+
102+
```sql
103+
CREATE EXTERNAL FILE FORMAT DeltaTableFormat WITH(FORMAT_TYPE = DELTA);
104+
```
105+
106+
### 2. Create external table
107+
108+
The Delta table files are located at `/delta/Delta_yob/` and the external data source for this example, is an S3-Compliant Object Storage, previously configured under the data source `s3_eds`. Polybase can use the as LOCATION the Delta table folder or the absolute file itself, which would be located at `delta/Delta_yob/_delta_log/00000000000000000000.json`.
109+
110+
```sql
111+
-- Create External Table using Delta
112+
CREATE EXTERNAL TABLE ext_bandmember
113+
( id int,
114+
name VARCHAR(200),
115+
dob date
116+
)WITH
117+
( LOCATION = '/delta/Delta_yob/'
118+
, FILE_FORMAT = DeltaTableFormat
119+
, DATA_SOURCE = s3_eds
120+
)
121+
GO
122+
```
123+
124+
## Next steps
125+
126+
For more tutorials on creating external data sources and external tables to a variety of data sources, see [PolyBase Transact-SQL reference](polybase-t-sql-objects.md).
127+
128+
- [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md)
129+
- [CREATE EXTERNAL FILE FORMAT](../../t-sql/statements/create-external-file-format-transact-sql.md)
130+
- [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md)

docs/t-sql/statements/create-external-file-format-transact-sql.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ The following file formats are supported:
3838
- JSON - Applies to Azure SQL Edge only. For information on using OPENROWSET to import JSON data in other platforms, see [Import JSON documents into SQL Server](../../relational-databases/json/import-json-documents-into-sql-server.md) or [Query JSON files using serverless SQL pool in Azure Synapse Analytics](/azure/synapse-analytics/sql/query-json-files).
3939

4040
![Topic link icon](../../database-engine/configure-windows/media/topic-link.gif "Topic link icon") [Transact-SQL Syntax Conventions](../../t-sql/language-elements/transact-sql-syntax-conventions-transact-sql.md)
41-
41+
42+
- Delta Table - Applies to SQL Server 2022.
43+
4244
## Syntax
4345

4446
### [Delimited text](#tab/delimited)
@@ -118,6 +120,17 @@ WITH (
118120
| 'org.apache.hadoop.io.compress.DefaultCodec' }
119121
]);
120122
```
123+
### [Delta Table](#tab/Delta)
124+
```syntaxsql
125+
-- Create an external file format for Delta Table files.
126+
CREATE EXTERNAL FILE FORMAT file_format_name
127+
WITH (
128+
FORMAT_TYPE = DELTA
129+
[ , DATA_COMPRESSION = {
130+
'org.apache.hadoop.io.compress.SnappyCodec'
131+
| 'org.apache.hadoop.io.compress.GzipCodec' }
132+
]);
133+
```
121134

122135
> [!NOTE]
123136
> [!INCLUDE[synapse-analytics-od-unsupported-syntax](../../includes/synapse-analytics-od-unsupported-syntax.md)]
@@ -409,7 +422,7 @@ WITH (
409422
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
410423
);
411424
```
412-
### E. Create a Delimited Text File Skipping Header Row (Azure Synapse Analytics Only)
425+
### E. Create a Delimited Text File Skipping Header Row (Azure Synapse Analytics and SQL Server 2022 Only)
413426
This example creates an external file format for CSV file with a single header row.
414427

415428
```sql
@@ -432,6 +445,15 @@ WITH (
432445
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
433446
);
434447
```
448+
### G. Create a Delta table external file format
449+
This example creates an external file format for Delta table type file format. This example applies to [!INCLUDE[sssql22-md](../../includes/sssql22-md.md)]. If DATA_COMPRESSION isn't specified, the default is no compression.
450+
```sql
451+
CREATE EXTERNAL FILE FORMAT DeltaFileFormat
452+
WITH(
453+
FORMAT_TYPE = DELTA,
454+
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
455+
);
456+
```
435457

436458
## See Also
437459
[CREATE EXTERNAL DATA SOURCE &#40;Transact-SQL&#41;](../../t-sql/statements/create-external-data-source-transact-sql.md)

0 commit comments

Comments
 (0)