| title | CREATE EXTERNAL DATA SOURCE (Transact-SQL) | Microsoft Docs | |||
|---|---|---|---|---|
| ms.custom | ||||
| ms.date | 04/01/2019 | |||
| ms.prod | sql | |||
| ms.prod_service | database-engine, sql-database, sql-data-warehouse, pdw | |||
| ms.reviewer | ||||
| ms.technology | t-sql | |||
| ms.topic | language-reference | |||
| f1_keywords |
|
|||
| dev_langs |
|
|||
| helpviewer_keywords |
|
|||
| author | CarlRabeler | |||
| ms.author | carlrab | |||
| manager | craigg | |||
| monikerRange | >=aps-pdw-2016||=azuresqldb-current||=azure-sqldw-latest||>=sql-server-2016||=sqlallproducts-allversions||>=sql-server-linux-2017||=azuresqldb-mi-current |
[!INCLUDEtsql-appliesto-ss2016-all-md]
External data sources are used to establish connectivity and support four primary use cases:
- Data virtualization and data load using PolyBase
- Bulk load operations using SQL Server or SQL Database using
BULK INSERTorOPENROWSET - Query remote SQL Database or SQL Data Warehouse instances using SQL Database with elastic query
- Query a sharded Azure SQL Database using elastic query
Note
PolyBase is supported on SQL Server (2016 or higher), Azure SQL Data Warehouse and Parallel Data Warehouse. Elastic queries are supported only on Azure SQL Database v12 or later.
Transact-SQL Syntax Conventions
CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
( LOCATION = '<prefix>://<path>[:<port>]'
[, CONNECTION_OPTIONS = '<name_value_pairs>']
[, CREDENTIAL = <credential_name> ]
[, PUSHDOWN = ON | OFF]
[, TYPE = HADOOP | BLOB_STORAGE | RDBMS | SHARD_MAP_MANAGER ]
[, RESOURCE_MANAGER_LOCATION = '<resource_manager>[:<port>]']
[, DATABASE_NAME = '<database_name>' ]
[, SHARD_MAP_NAME = '<shard_map_manager>' ]
)
[;]Specifies the user-defined name for the data source. The name must be unique within the database in SQL Server, SQL Database (SQL DB), and SQL Data Warehouse (SQL DW). The name must be unique within the server in Parallel Data Warehouse (PDW).
Provides the connectivity protocol and path to the external data source.
| External Data Source | Location prefix | Location path | Supported locations by product / service |
|---|---|---|---|
| Cloudera or Hortonworks | hdfs | <Namenode>[:port] |
SQL Server (2016+), PDW |
| Azure Blob Storage | wasb[s] | <container>@<storage_account>.blob.core.windows.net |
SQL Server (2016+), PDW, SQL DW |
| ADLS Gen 1 | adl | <storage_account>.azuredatalake.net |
SQL DW |
| ADLS Gen 2 | abfs | <container>@<storage_account>.dfs.core.windows.net |
SQL DW |
| SQL Server | sqlserver | <server_name>[\<instance_name>][:port] |
SQL Server (2019+) |
| Oracle | oracle | <server_name>[:port] |
SQL Server (2019+) |
| Teradata | teradata | <server_name>[:port] |
SQL Server (2019+) |
| MongoDB or CosmosDB | mongodb | <server_name>[:port] |
SQL Server (2019+) |
| ODBC | odbc | <server_name>{:port] |
SQL Server (2019+) - Windows only |
| Bulk Operations | https | <storage_account>.blob.core.windows.net/<container> |
SQL Server (2017+), SQL DB |
| Elastic Query (shard) | Not required | <shard_map_server_name>.database.windows.net |
SQL DB |
| Elastic Query (remote) | Not required | <remote_server_name>.database.windows.net |
SQL DB |
Location path parameters:
<Namenode>= the machine name, name service URI or IP address of the Namenode in the Hadoop cluster. PolyBase must be able to resolve any DNS names used by the Hadoop cluster. For highly available hadoop configurations you may alternatively provide the Nameservice ID as theLOCATION.port= The port that the external data source is listening on. For Hadoop this can be found using thefs.default.nameconfiguration parameter in Hadoop. The default is 8020.<container>= the container of the storage account holding the data. Root containers are read-only, so data cannot be written back to the container.<storage_account>= the storage account name of the azure resource.<server_name>= the host name.<instance_name>= the name of the SQL Server named instance. Used if you have SQL Server Browser Service running on the target instance.<shard_map_server_name>= The logical server name in Azure that is hosting the shard map manager. TheDATABASE_NAMEargument provides the database used to host the shard map andSHARD_MAP_NAMEis used for the shard map itself.<remote_server_name>= The target logical server name for the elastic query. The database name is specified using theDATABASE_NAMEargument.
Additional notes and guidance when setting the location:
- The SQL engine does not verify the existence of the external data source when the object is created. To validate the external create an external table over the data source.
- Use the same external data source for all tables when querying Hadoop. This ensures consistent semantics in the query.
- You can use the
sqlserverlocation prefix when connecting from SQL Server 2019 to also connect to SQL Database or SQL Data Warehouse. - You must also specify the
Driver={<Name of Driver>}when connecting viaODBC wasbis the default protocol for Azure blob storage.wasbsis optional but strongly recommended as data will be sent using a secure SSL connection.- To ensure successful PolyBase queries in the event of Hadoop Namenode fail-over, consider using a virtual IP address for the Namenode of the Hadoop cluster. If you do not use a virtual IP address for the Hadoop Namenode, in the event of a Hadoop Namenode fail-over you will have to ALTER EXTERNAL DATA SOURCE object to point to the new location.
Specifies additional options when connecting over ODBC to an external data source.
The name of the driver is required as a minimum but there are other options such as APP='<your_application_name>' or ApplicationIntent= ReadOnly|ReadWrite which are also useful to set and can assist with troubleshooting.
Please refer to the ODBC product documentation for a list of permitted CONNECTION_OPTIONS
The push-down argument defines whether computation is allowed to be pushed down to the external data source. It is on by default.
Enabling or disabling push-down at the external data source level is supported when connecting to SQL Server, Oracle, Teradata, MongoDB or over ODBC.
Enabling or disabling push-down at the query level is achieved through a hint.
Specifies a database-scoped credential for authenticating to the external data source.
Additional notes and guidance when creating a credential:
- To load from Azure Blob storage or ADLS Gen 2 into SQL DW or Parallel Data Warehouse, the secret must be the Azure Storage Key.
CREDENTIALis only required if the blob has been secured.CREDENTIALis not required for public data sets that allow anonymous access.- When the
TYPE=BLOB_STORAGEthe credential must be created usingSHARED ACCESS SIGNATUREas the identity. Furthermore, the SAS token should be configured as follows:- Exclude the leading
?when configured as the secret - Have at least read permission on the file that should be loaded (for example
srt=o&sp=r) - the expiration period should be valid (all dates are in UTC time).
- Exclude the leading
For an example of using a CREDENTIAL with SHARED ACCESS SIGNATURE and TYPE = BLOB_STORAGE refer to I. Create an external data source to perform bulk operations and retrieve data from Azure Blob Storage into SQL Database
To create a database scoped credential, see [CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)][CREATE_DATABASE_SCOPED_CREDENTIAL].
Specifies the type of the external data source being configured. This parameter is not always required.
- Use HADOOP when the external data source is Cloudera, Hortonworks, Azure Blob Storage, Azure Data Lake Store Gen 1 or Azure Data Lake Store Gen 2.
- Use RDBMS for cross-database queries using elastic query from SQL Database.
- Use SHARD_MAP_MANAGER when creating an external data source when connecting to a sharded SQL Database.
- Use BLOB_STORAGE when performing bulk operations using BULK INSERT or OPENROWSET with [!INCLUDEssSQLv14_md].
Important
Do not set TYPE if using any other external data source.
For an example of using TYPE = HADOOP to load data from Azure Blob Storage refer to "Create external data source to reference Azure blob storage".
Configure this optional value when connecting to Hortonworks or Cloudera.
When the RESOURCE_MANAGER_LOCATION is defined, the query optimizer will consider optimizing each query by initiating a map reduce job on the external Hadoop source and pushing down computation. This is entirely a cost-based decision and can significantly reduce the volume of data transferred between Hadoop and SQL, and therefore improve query performance.
If the resource manager is not specified, pushing compute to Hadoop is disabled for PolyBase queries.
If the port is not specified, the default value is determined using the current setting for 'hadoop connectivity' configuration.
| Hadoop Connectivity | Default Resource Manager Port |
|---|---|
| 1 | 50300 |
| 2 | 50300 |
| 3 | 8021 |
| 4 | 8032 |
| 5 | 8050 |
| 6 | 8032 |
| 7 | 8050 |
For a complete list of Hadoop distributions and versions supported by each connectivity value, see PolyBase Connectivity Configuration (Transact-SQL).
Important
The RESOURCE_MANAGER_LOCATION value is not validated when you create the external data source. Entering an incorrect value may cause query failure at execution time whenever push-down is attempted as the provided value would not be able to resolve.
For an example showing the RESOURCE_MANAGER_LOCATION enabled refer to "Create external data source to reference Hadoop with push-down enabled"
Configure this argument when the TYPE is set to RDBMS or SHARD_MAP_MANAGER.
| TYPE | Value of DATABASE_NAME |
|---|---|
| RDBMS | The name of the remote database on the server provided using LOCATION |
| SHARD_MAP_MANAGER | Name of the database operating as the shard map manager |
For an example showing how to create an external data source where TYPE = RDBMS refer to "Create an RDBMS external data source"
Used when the TYPE argument is set to SHARD_MAP_MANAGER only to set the name of the shard map.
For an example showing how to create an external data source where TYPE = SHARD_MAP_MANAGER refer to "Create a shard map manager external data source"
Requires CONTROL permission on database in SQL Server, Parallel Data Warehouse, SQL Database and SQL Data Warehouse.
Note
In previous releases of PDW, create external data source required ALTER ANY EXTERNAL DATA SOURCE permissions.
Takes a shared lock on the EXTERNAL DATA SOURCE object.
PolyBase supports proxy based authentication for most external data sources. Create a database scoped credential to create the proxy account.
When you connect to the storage or data pool in a SQL 2019 Big Data Cluster the users credentials are passed through to the back-end system. The data pool requires that you must also create logins in the data pool itself for pass through authentication to be successful.
To create an external data source to reference Oracle ensure you have a database scoped credential. You may optionally also enable or disable push-down of computation against this data source.
-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '!MyC0mpl3xP@ssw0rd!
;
-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL OracleProxyAccount
WITH
IDENTITY = 'oracle_username'
, SECRET = 'oracle_password'
;
CREATE EXTERNAL DATA SOURCE MyOracleServer
WITH
( LOCATION = 'oracle://145.145.145.145:1521'
, CREDENTIAL = OracleProxyAccount
, PUSHDOWN = ON
;To create an external data source to reference your Hortonworks or Cloudera Hadoop cluster, specify the machine name, or IP address of the Hadoop Namenode and port. For highly available hadoop configurations you may alternatively provide the Nameservice ID as the LOCATION
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
( LOCATION = 'hdfs://10.10.10.10:8050'
, TYPE = HADOOP
)
;Specify the RESOURCE_MANAGER_LOCATION option to enable push-down computation to Hadoop for PolyBase queries. Once enabled, PolyBase uses a cost-based decision to determine whether the query computation should be pushed to Hadoop or all the data should be moved to process the query in SQL Server.
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
( LOCATION = 'hdfs://10.10.10.10:8020'
, TYPE = HADOOP
, RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;To verify if the Hadoop cluster is Kerberos-secured, check the value of hadoop.security.authentication property in Hadoop core-site.xml. To reference a Kerberos-secured Hadoop cluster, you must specify a database scoped credential that contains your Kerberos username and password. The database master key is used to encrypt the database scoped credential secret.
-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;
-- Create a database scoped credential with Kerberos user name and password.
CREATE DATABASE SCOPED CREDENTIAL HadoopUser1
WITH
IDENTITY = '<hadoop_user_name>'
, SECRET = '<hadoop_password>'
;
-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH
( LOCATION = 'hdfs://10.10.10.10:8050'
, CREDENTIAL = HadoopUser1
, TYPE = HADOOP
, RESOURCE_MANAGER_LOCATION = '10.10.10.10:8050'
)
;In this example, the external data source is an Azure blob storage container called daily under Azure storage account named logs. The Azure storage external data source is for data transfer only; and it does not support predicate push-down.
This example shows how to create the database scoped credential for authentication to Azure storage. Specify the Azure storage account key in the database credential secret. Specify any string in database scoped credential identity, it is not used for authentication to Azure storage. Then, the credential is used in the statement that creates an external data source.
-- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
;
-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
IDENTITY = '<my_account>'
, SECRET = '<azure_storage_account_key>'
;
-- Create an external data source with CREDENTIAL option.
CREATE EXTERNAL DATA SOURCE MyAzureStorage
WITH
( LOCATION = 'wasbs://daily@logs.blob.core.windows.net/'
, CREDENTIAL = AzureStorageCredential
, TYPE = HADOOP
)
;To create an external data source to reference a SHARD_MAP_MANAGER, specify the SQL Database server name that hosts the shard map manager in SQL Database or a SQL Server database on a virtual machine.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;
CREATE DATABASE SCOPED CREDENTIAL ElasticDBQueryCred
WITH
IDENTITY = '<username>'
, SECRET = '<password>'
;
CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
( TYPE = SHARD_MAP_MANAGER
, LOCATION = '<server_name>.database.windows.net'
, DATABASE_NAME = 'ElasticScaleStarterKit_ShardMapManagerDb'
, CREDENTIAL = ElasticDBQueryCred
, SHARD_MAP_NAME = 'CustomerIDShardMap'
)
;For a step-by-step tutorial, see Getting started with elastic queries for sharding (horizontal partitioning).
To create an external data source to reference a RDBMS, specifies the SQL Database server name of the remote database in SQL Database.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;
CREATE DATABASE SCOPED CREDENTIAL SQL_Credential
WITH
IDENTITY = '<username>'
, SECRET = '<password>'
;
CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
( TYPE = RDBMS
, LOCATION = '<server_name>.database.windows.net'
, DATABASE_NAME = 'Customers'
, CREDENTIAL = SQL_Credential
)
;For a step-by-step tutorial on RDBMS, see Getting started with cross-database queries (vertical partitioning).
Azure Data lake Store connectivity is based on your ADLS URI and your Azure Active directory Application's service principle. Documentation for creating this application can be found at[Data lake store authentication using Active Directory][azure_ad[].
-- If you do not have a Master Key on your DW you will need to create one.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;
-- These values come from your Azure Active Directory Application used to authenticate to ADLS
CREATE DATABASE SCOPED CREDENTIAL ADLS_credential
WITH
-- IDENTITY = '<clientID>@<OAuth2.0TokenEndPoint>'
IDENTITY = '536540b4-4239-45fe-b9a3-629f97591c0c@https://login.microsoftonline.com/42f988bf-85f1-41af-91ab-2d2cd011da47/oauth2/token'
--, SECRET = '<KEY>'
, SECRET = 'BjdIlmtKp4Fpyh9hIvr8HJlUida/seM5kQ3EpLAmeDI='
;
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH
( LOCATION = 'adl://newyorktaxidataset.azuredatalakestore.net'
, CREDENTIAL = ADLS_credential
, TYPE = HADOOP
)
;Currently connecting to ADLS Gen 2 requires the storage account key as the secret for the database scoped credential. Oauth2.0 support is not available today. This is a temp
-- If you do not have a Master Key on your DW you will need to create one.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
;
-- These values come from your Azure Active Directory Application used to authenticate to ADLS
CREATE DATABASE SCOPED CREDENTIAL ADLS_credential
WITH
-- IDENTITY = '<storage_account_name>'
IDENTITY = 'newyorktaxidata'
--, SECRET = '<storage_account_key>'
, SECRET = 'yz5N4+bxSb89McdiysJAzo+9hgEHcJRJuXbF/uC3mhbezES/oe00vXnZEl14U0lN3vxrFKsphKov16C0w6aiTQ=='
;
CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
( LOCATION = 'abfs://2013@newyorktaxidataset.dfs.core.windows.net'
, CREDENTIAL = ADLS_credential
, TYPE = HADOOP
)
[;]Note
Do not put a trailing /, file name, or shared access signature parameters at the end of the LOCATION URL when configuring an external data source for bulk operations.
Applies to: [!INCLUDEssSQLv14_md].
Use the following data source for bulk operations using BULK INSERT or OPENROWSET. The credential used, must be created using SHARED ACCESS SIGNATURE as the identity, should not have the leading ? in SAS token, must have at least read permission on the file that should be loaded (for example srt=o&sp=r), and the expiration period should be valid (all dates are in UTC time). For more information on shared access signatures, see Using Shared Access Signatures (SAS).
CREATE DATABASE SCOPED CREDENTIAL AccessAzureInvoices
WITH
IDENTITY = 'SHARED ACCESS SIGNATURE'
-- REMOVE ? FROM THE BEGINNING OF THE SAS TOKEN
, SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************'
;
CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH
( LOCATION = 'https://newinvoices.blob.core.windows.net/week3'
, CREDENTIAL = AccessAzureInvoices
, TYPE = BLOB_STORAGE
)
;To see this example in use, see BULK INSERT.
- ALTER EXTERNAL DATA SOURCE (Transact-SQL)
- CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)
- CREATE EXTERNAL FILE FORMAT (Transact-SQL)
- CREATE EXTERNAL TABLE (Transact-SQL)
- CREATE EXTERNAL TABLE AS SELECT (Azure SQL Data Warehouse)
- CREATE TABLE AS SELECT (Azure SQL Data Warehouse)
- sys.external_data_sources (Transact-SQL)
- Using Shared Access Signatures (SAS)
- Introduction to elastic query