Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Latest commit

 

History

History
501 lines (379 loc) · 26.8 KB

File metadata and controls

501 lines (379 loc) · 26.8 KB
title CREATE EXTERNAL DATA SOURCE (Transact-SQL) | Microsoft Docs
ms.custom
ms.date 04/01/2019
ms.prod sql
ms.prod_service database-engine, sql-database, sql-data-warehouse, pdw
ms.reviewer
ms.technology t-sql
ms.topic language-reference
f1_keywords
CREATE EXTERNAL DATA SOURCE
CREATE_EXTERNAL_DATA_SOURCE
dev_langs
TSQL
helpviewer_keywords
External
External, data source
PolyBase, create data source
author CarlRabeler
ms.author carlrab
manager craigg
monikerRange >=aps-pdw-2016||=azuresqldb-current||=azure-sqldw-latest||>=sql-server-2016||=sqlallproducts-allversions||>=sql-server-linux-2017||=azuresqldb-mi-current

CREATE EXTERNAL DATA SOURCE (Transact-SQL)

[!INCLUDEtsql-appliesto-ss2016-all-md]

External data sources are used to establish connectivity and support four primary use cases:

  1. Data virtualization and data load using PolyBase
  2. Bulk load operations using SQL Server or SQL Database using BULK INSERT or OPENROWSET
  3. Query remote SQL Database or SQL Data Warehouse instances using SQL Database with elastic query
  4. Query a sharded Azure SQL Database using elastic query

Note

PolyBase is supported on SQL Server (2016 or higher), Azure SQL Data Warehouse and Parallel Data Warehouse. Elastic queries are supported only on Azure SQL Database v12 or later.

Topic link icon Transact-SQL Syntax Conventions

Syntax

CREATE EXTERNAL DATA SOURCE <data_source_name>  
WITH
(    LOCATION                  = '<prefix>://<path>[:<port>]'
[,   CONNECTION_OPTIONS        = '<name_value_pairs>']
[,   CREDENTIAL                = <credential_name> ]
[,   PUSHDOWN                  = ON | OFF]
[,   TYPE                      = HADOOP | BLOB_STORAGE | RDBMS | SHARD_MAP_MANAGER ]
[,   RESOURCE_MANAGER_LOCATION = '<resource_manager>[:<port>]']
[,   DATABASE_NAME             = '<database_name>' ]
[,   SHARD_MAP_NAME            = '<shard_map_manager>' ]
)
[;]

Arguments

Data_Source_Name

Specifies the user-defined name for the data source. The name must be unique within the database in SQL Server, SQL Database (SQL DB), and SQL Data Warehouse (SQL DW). The name must be unique within the server in Parallel Data Warehouse (PDW).

LOCATION = '<prefix>://<path[:port]>'

Provides the connectivity protocol and path to the external data source.

External Data Source Location prefix Location path Supported locations by product / service
Cloudera or Hortonworks hdfs <Namenode>[:port] SQL Server (2016+), PDW
Azure Blob Storage wasb[s] <container>@<storage_account>.blob.core.windows.net SQL Server (2016+), PDW, SQL DW
ADLS Gen 1 adl <storage_account>.azuredatalake.net SQL DW
ADLS Gen 2 abfs <container>@<storage_account>.dfs.core.windows.net SQL DW
SQL Server sqlserver <server_name>[\<instance_name>][:port] SQL Server (2019+)
Oracle oracle <server_name>[:port] SQL Server (2019+)
Teradata teradata <server_name>[:port] SQL Server (2019+)
MongoDB or CosmosDB mongodb <server_name>[:port] SQL Server (2019+)
ODBC odbc <server_name>{:port] SQL Server (2019+) - Windows only
Bulk Operations https <storage_account>.blob.core.windows.net/<container> SQL Server (2017+), SQL DB
Elastic Query (shard) Not required <shard_map_server_name>.database.windows.net SQL DB
Elastic Query (remote) Not required <remote_server_name>.database.windows.net SQL DB

Location path parameters:

  • <Namenode> = the machine name, name service URI or IP address of the Namenode in the Hadoop cluster. PolyBase must be able to resolve any DNS names used by the Hadoop cluster. For highly available hadoop configurations you may alternatively provide the Nameservice ID as the LOCATION.
  • port = The port that the external data source is listening on. For Hadoop this can be found using the fs.default.name configuration parameter in Hadoop. The default is 8020.
  • <container> = the container of the storage account holding the data. Root containers are read-only, so data cannot be written back to the container.
  • <storage_account> = the storage account name of the azure resource.
  • <server_name> = the host name.
  • <instance_name> = the name of the SQL Server named instance. Used if you have SQL Server Browser Service running on the target instance.
  • <shard_map_server_name> = The logical server name in Azure that is hosting the shard map manager. The DATABASE_NAME argument provides the database used to host the shard map and SHARD_MAP_NAME is used for the shard map itself.
  • <remote_server_name> = The target logical server name for the elastic query. The database name is specified using the DATABASE_NAME argument.

Additional notes and guidance when setting the location:

  • The SQL engine does not verify the existence of the external data source when the object is created. To validate the external create an external table over the data source.
  • Use the same external data source for all tables when querying Hadoop. This ensures consistent semantics in the query.
  • You can use the sqlserver location prefix when connecting from SQL Server 2019 to also connect to SQL Database or SQL Data Warehouse.
  • You must also specify the Driver={<Name of Driver>} when connecting via ODBC
  • wasb is the default protocol for Azure blob storage. wasbs is optional but strongly recommended as data will be sent using a secure SSL connection.
  • To ensure successful PolyBase queries in the event of Hadoop Namenode fail-over, consider using a virtual IP address for the Namenode of the Hadoop cluster. If you do not use a virtual IP address for the Hadoop Namenode, in the event of a Hadoop Namenode fail-over you will have to ALTER EXTERNAL DATA SOURCE object to point to the new location.

CONNECTION_OPTIONS = key_value_pair

Specifies additional options when connecting over ODBC to an external data source.

The name of the driver is required as a minimum but there are other options such as APP='<your_application_name>' or ApplicationIntent= ReadOnly|ReadWrite which are also useful to set and can assist with troubleshooting.

Please refer to the ODBC product documentation for a list of permitted CONNECTION_OPTIONS

PUSHDOWN = ON | OFF

The push-down argument defines whether computation is allowed to be pushed down to the external data source. It is on by default.

Enabling or disabling push-down at the external data source level is supported when connecting to SQL Server, Oracle, Teradata, MongoDB or over ODBC.

Enabling or disabling push-down at the query level is achieved through a hint.

CREDENTIAL = credential_name

Specifies a database-scoped credential for authenticating to the external data source.

Additional notes and guidance when creating a credential:

  • To load from Azure Blob storage or ADLS Gen 2 into SQL DW or Parallel Data Warehouse, the secret must be the Azure Storage Key.
  • CREDENTIAL is only required if the blob has been secured. CREDENTIAL is not required for public data sets that allow anonymous access.
  • When the TYPE = BLOB_STORAGE the credential must be created using SHARED ACCESS SIGNATURE as the identity. Furthermore, the SAS token should be configured as follows:
    • Exclude the leading ? when configured as the secret
    • Have at least read permission on the file that should be loaded (for example srt=o&sp=r)
    • the expiration period should be valid (all dates are in UTC time).

For an example of using a CREDENTIAL with SHARED ACCESS SIGNATURE and TYPE = BLOB_STORAGE refer to I. Create an external data source to perform bulk operations and retrieve data from Azure Blob Storage into SQL Database

To create a database scoped credential, see [CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)][CREATE_DATABASE_SCOPED_CREDENTIAL].

TYPE = [ HADOOP | BLOB_STORAGE | RDBMS | SHARD_MAP_MANAGER]

Specifies the type of the external data source being configured. This parameter is not always required.

  • Use HADOOP when the external data source is Cloudera, Hortonworks, Azure Blob Storage, Azure Data Lake Store Gen 1 or Azure Data Lake Store Gen 2.
  • Use RDBMS for cross-database queries using elastic query from SQL Database.
  • Use SHARD_MAP_MANAGER when creating an external data source when connecting to a sharded SQL Database.
  • Use BLOB_STORAGE when performing bulk operations using BULK INSERT or OPENROWSET with [!INCLUDEssSQLv14_md].

Important

Do not set TYPE if using any other external data source.

For an example of using TYPE = HADOOP to load data from Azure Blob Storage refer to "Create external data source to reference Azure blob storage".

RESOURCE_MANAGER_LOCATION = 'ResourceManager_URI[:port]'

Configure this optional value when connecting to Hortonworks or Cloudera.

When the RESOURCE_MANAGER_LOCATION is defined, the query optimizer will consider optimizing each query by initiating a map reduce job on the external Hadoop source and pushing down computation. This is entirely a cost-based decision and can significantly reduce the volume of data transferred between Hadoop and SQL, and therefore improve query performance.

If the resource manager is not specified, pushing compute to Hadoop is disabled for PolyBase queries.

If the port is not specified, the default value is determined using the current setting for 'hadoop connectivity' configuration.

Hadoop Connectivity Default Resource Manager Port
1 50300
2 50300
3 8021
4 8032
5 8050
6 8032
7 8050

For a complete list of Hadoop distributions and versions supported by each connectivity value, see PolyBase Connectivity Configuration (Transact-SQL).

Important

The RESOURCE_MANAGER_LOCATION value is not validated when you create the external data source. Entering an incorrect value may cause query failure at execution time whenever push-down is attempted as the provided value would not be able to resolve.

For an example showing the RESOURCE_MANAGER_LOCATION enabled refer to "Create external data source to reference Hadoop with push-down enabled"

DATABASE_NAME = database_name

Configure this argument when the TYPE is set to RDBMS or SHARD_MAP_MANAGER.

TYPE Value of DATABASE_NAME
RDBMS The name of the remote database on the server provided using LOCATION
SHARD_MAP_MANAGER Name of the database operating as the shard map manager

For an example showing how to create an external data source where TYPE = RDBMS refer to "Create an RDBMS external data source"

SHARD_MAP_NAME = shard_map_name

Used when the TYPE argument is set to SHARD_MAP_MANAGER only to set the name of the shard map.

For an example showing how to create an external data source where TYPE = SHARD_MAP_MANAGER refer to "Create a shard map manager external data source"

Permissions

Requires CONTROL permission on database in SQL Server, Parallel Data Warehouse, SQL Database and SQL Data Warehouse.

Note

In previous releases of PDW, create external data source required ALTER ANY EXTERNAL DATA SOURCE permissions.

Locking

Takes a shared lock on the EXTERNAL DATA SOURCE object.

Security

PolyBase supports proxy based authentication for most external data sources. Create a database scoped credential to create the proxy account.

When you connect to the storage or data pool in a SQL 2019 Big Data Cluster the users credentials are passed through to the back-end system. The data pool requires that you must also create logins in the data pool itself for pass through authentication to be successful.

Examples: SQL Server (2016+) and Parallel Data Warehouse

A. Create external data source in SQL 2019 to reference Oracle

To create an external data source to reference Oracle ensure you have a database scoped credential. You may optionally also enable or disable push-down of computation against this data source.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '!MyC0mpl3xP@ssw0rd!
;

-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL OracleProxyAccount
WITH
     IDENTITY   = 'oracle_username'
,    SECRET     = 'oracle_password'
;

CREATE EXTERNAL DATA SOURCE MyOracleServer
WITH
(    LOCATION   = 'oracle://145.145.145.145:1521'
,    CREDENTIAL = OracleProxyAccount
,    PUSHDOWN   = ON
;

A. Create external data source to reference Hadoop

To create an external data source to reference your Hortonworks or Cloudera Hadoop cluster, specify the machine name, or IP address of the Hadoop Namenode and port. For highly available hadoop configurations you may alternatively provide the Nameservice ID as the LOCATION

CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION = 'hdfs://10.10.10.10:8050'
,    TYPE     = HADOOP
)
;

B. Create external data source to reference Hadoop with push-down enabled

Specify the RESOURCE_MANAGER_LOCATION option to enable push-down computation to Hadoop for PolyBase queries. Once enabled, PolyBase uses a cost-based decision to determine whether the query computation should be pushed to Hadoop or all the data should be moved to process the query in SQL Server.

CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION                  = 'hdfs://10.10.10.10:8020'
,    TYPE                      = HADOOP
,    RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;

C. Create external data source to reference Kerberos-secured Hadoop

To verify if the Hadoop cluster is Kerberos-secured, check the value of hadoop.security.authentication property in Hadoop core-site.xml. To reference a Kerberos-secured Hadoop cluster, you must specify a database scoped credential that contains your Kerberos username and password. The database master key is used to encrypt the database scoped credential secret.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Kerberos user name and password.
CREATE DATABASE SCOPED CREDENTIAL HadoopUser1
WITH
     IDENTITY   = '<hadoop_user_name>'
,    SECRET     = '<hadoop_password>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
(    LOCATION                  = 'hdfs://10.10.10.10:8050'
,    CREDENTIAL                = HadoopUser1
,    TYPE                      = HADOOP
,    RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;

Examples: SQL Server (2016+), SQL Data Warehouse and Parallel Data Warehouse

D. Create external data source to reference Azure blob storage

In this example, the external data source is an Azure blob storage container called daily under Azure storage account named logs. The Azure storage external data source is for data transfer only; and it does not support predicate push-down.

This example shows how to create the database scoped credential for authentication to Azure storage. Specify the Azure storage account key in the database credential secret. Specify any string in database scoped credential identity, it is not used for authentication to Azure storage. Then, the credential is used in the statement that creates an external data source.

-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;

-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
     IDENTITY   = '<my_account>'
,    SECRET     = '<azure_storage_account_key>'
;

-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyAzureStorage
WITH
(    LOCATION   = 'wasbs://daily@logs.blob.core.windows.net/'
,    CREDENTIAL = AzureStorageCredential
,    TYPE       = HADOOP
)
;

Examples: SQL Database

E. Create a Shard map manager external data source

To create an external data source to reference a SHARD_MAP_MANAGER, specify the SQL Database server name that hosts the shard map manager in SQL Database or a SQL Server database on a virtual machine.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

CREATE DATABASE SCOPED CREDENTIAL ElasticDBQueryCred
WITH
     IDENTITY   = '<username>'
,    SECRET     = '<password>'
;

CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
(    TYPE             = SHARD_MAP_MANAGER
,    LOCATION         = '<server_name>.database.windows.net'
,    DATABASE_NAME    = 'ElasticScaleStarterKit_ShardMapManagerDb'
,    CREDENTIAL       = ElasticDBQueryCred
,    SHARD_MAP_NAME   = 'CustomerIDShardMap'
)
;

For a step-by-step tutorial, see Getting started with elastic queries for sharding (horizontal partitioning).

F. Create an RDBMS external data source

To create an external data source to reference a RDBMS, specifies the SQL Database server name of the remote database in SQL Database.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

CREATE DATABASE SCOPED CREDENTIAL SQL_Credential  
WITH
     IDENTITY  = '<username>'
,    SECRET    = '<password>'
;

CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
(    TYPE          = RDBMS
,    LOCATION      = '<server_name>.database.windows.net'
,    DATABASE_NAME = 'Customers'
,    CREDENTIAL    = SQL_Credential
)
;

For a step-by-step tutorial on RDBMS, see Getting started with cross-database queries (vertical partitioning).

Examples: SQL Data Warehouse

G. Create external data source to reference Azure Data Lake Store Gen 1

Azure Data lake Store connectivity is based on your ADLS URI and your Azure Active directory Application's service principle. Documentation for creating this application can be found at[Data lake store authentication using Active Directory][azure_ad[].

-- If you do not have a Master Key on your DW you will need to create one.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

-- These values come from your Azure Active Directory Application used to authenticate to ADLS
CREATE DATABASE SCOPED CREDENTIAL ADLS_credential
WITH
--   IDENTITY   = '<clientID>@<OAuth2.0TokenEndPoint>'
     IDENTITY   = '536540b4-4239-45fe-b9a3-629f97591c0c@https://login.microsoftonline.com/42f988bf-85f1-41af-91ab-2d2cd011da47/oauth2/token'
--,  SECRET     = '<KEY>'
,    SECRET     = 'BjdIlmtKp4Fpyh9hIvr8HJlUida/seM5kQ3EpLAmeDI='
;
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH
(    LOCATION       = 'adl://newyorktaxidataset.azuredatalakestore.net'
,    CREDENTIAL     = ADLS_credential
,    TYPE           = HADOOP
)
;

H. Create external data source to reference Azure Data Lake Store (ADLS) Gen 2

Currently connecting to ADLS Gen 2 requires the storage account key as the secret for the database scoped credential. Oauth2.0 support is not available today. This is a temp

-- If you do not have a Master Key on your DW you will need to create one.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;

-- These values come from your Azure Active Directory Application used to authenticate to ADLS
CREATE DATABASE SCOPED CREDENTIAL ADLS_credential
WITH
--   IDENTITY   = '<storage_account_name>'
     IDENTITY   = 'newyorktaxidata'
--,  SECRET     = '<storage_account_key>'
,    SECRET     = 'yz5N4+bxSb89McdiysJAzo+9hgEHcJRJuXbF/uC3mhbezES/oe00vXnZEl14U0lN3vxrFKsphKov16C0w6aiTQ=='
;

CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
(    LOCATION   = 'abfs://2013@newyorktaxidataset.dfs.core.windows.net'
,    CREDENTIAL = ADLS_credential
,    TYPE       = HADOOP
)
[;]

Examples: Bulk Operations

Note

Do not put a trailing /, file name, or shared access signature parameters at the end of the LOCATION URL when configuring an external data source for bulk operations.

I. Create an external data source for bulk operations retrieving data from Azure Blob storage

Applies to: [!INCLUDEssSQLv14_md]. Use the following data source for bulk operations using BULK INSERT or OPENROWSET. The credential used, must be created using SHARED ACCESS SIGNATURE as the identity, should not have the leading ? in SAS token, must have at least read permission on the file that should be loaded (for example srt=o&sp=r), and the expiration period should be valid (all dates are in UTC time). For more information on shared access signatures, see Using Shared Access Signatures (SAS).

CREATE DATABASE SCOPED CREDENTIAL AccessAzureInvoices
WITH
     IDENTITY = 'SHARED ACCESS SIGNATURE'
--   REMOVE ? FROM THE BEGINNING OF THE SAS TOKEN
,    SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************'
;

CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH
(    LOCATION   = 'https://newinvoices.blob.core.windows.net/week3'
,    CREDENTIAL = AccessAzureInvoices
,    TYPE       = BLOB_STORAGE
)
;

To see this example in use, see BULK INSERT.

See Also