You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/t-sql/statements/copy-into-transact-sql.md
+47-23Lines changed: 47 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Use the COPY statement in Azure Synapse Analytics and Warehouse in
5
5
author: periclesrocha
6
6
ms.author: procha
7
7
ms.reviewer: wiassaf, mikeray
8
-
ms.date: 06/05/2023
8
+
ms.date: 10/27/2023
9
9
ms.service: sql
10
10
ms.subservice: t-sql
11
11
ms.topic: language-reference
@@ -37,7 +37,7 @@ Use COPY for the following capabilities:
37
37
- Specify a finer permission model without exposing storage account keys using Share Access Signatures (SAS)
38
38
- Use a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION)
39
39
- Customize default values for each target column and specify source data fields to load into specific target columns
40
-
- Specify a custom row terminator for CSV files
40
+
- Specify a custom row terminator, field terminator, and field quote for CSV files
41
41
- Use SQL Server Date formats for CSV files
42
42
- Specify wildcards and multiple files in the storage location path
43
43
- Automatic schema discovery simplifies the process of defining and mapping source data into target tables
@@ -79,7 +79,7 @@ WITH
79
79
80
80
#### *schema_name*
81
81
82
-
Optional if the default schema for the user performing the operation is the schema of the specified table. If *schema* isn't specified, and the default schema of the user performing the COPY operation is different from the specified table, COPY is canceled, and an error message is returned.
82
+
Optional if the default schema for the user performing the operation is the schema of the specified table. If *schema* isn't specified, and the default schema of the user performing the COPY operation is different from the schema of the specified table, COPY is canceled, and an error message is returned.
83
83
84
84
#### *table_name*
85
85
@@ -97,7 +97,7 @@ Don't specify a *column_list* when `AUTO_CREATE_TABLE = 'ON'`.
97
97
98
98
-*Column_name* - the name of the column in the target table.
99
99
-*Default_value* - the default value that replaces any NULL value in the input file. Default value applies to all file formats. COPY attempts to load NULL from the input file when a column is omitted from the column list or when there's an empty input file field. Default value precedes the keyword 'default'
100
-
-*Field_number* - the input file field number that is mapped to the target column name.
100
+
-*Field_number* - the input file field number that is mapped to the target column.
101
101
- The field indexing starts at 1.
102
102
103
103
When a column list isn't specified, COPY maps columns based on the source and target order: Input field 1 goes to target column 1, field 2 goes to column 2, etc.
@@ -195,9 +195,12 @@ Multiple file locations can only be specified from the same storage account and
195
195
196
196
*ERRORFILE* only applies to CSV. Specifies the directory within the COPY statement where the rejected rows and the corresponding error file should be written. The full path from the storage account can be specified or the path relative to the container can be specified. If the specified path doesn't exist, one is created on your behalf. A child directory is created with the name "\_rejectedrows". The "\_" character ensures that the directory is escaped for other data processing unless explicitly named in the location parameter.
197
197
198
+
> [!NOTE]
199
+
> When a relative path is passed to *ERRORFILE*, the path is relative to the container path specified in *external_location*.
200
+
198
201
Within this directory, there's a folder created based on the time of load submission in the format YearMonthDay -HourMinuteSecond (Ex. 20180330-173205). In this folder, two types of files are written, the reason (Error) file and the data (Row) file each preappending with the queryID, distributionID, and a file guid. Because the data and the reason are in separate files, corresponding files have a matching prefix.
199
202
200
-
If ERRORFILE has the full path of the storage account defined, then the ERRORFILE_CREDENTIAL is used to connect to that storage. Otherwise, the value mentioned for CREDENTIAL is used.
203
+
If ERRORFILE has the full path of the storage account defined, then the ERRORFILE_CREDENTIAL is used to connect to that storage. Otherwise, the value mentioned for CREDENTIAL is used. When the same credential that is used for the source data is used for ERRORFILE, restrictions that apply to ERRORFILE_CREDENTIAL also apply
@@ -236,7 +239,7 @@ If ERRORFILE has the full path of the storage account defined, then the ERRORFIL
236
239
237
240
#### *MAXERRORS = max_errors*
238
241
239
-
*MAXERRORS* specifies the maximum number of reject rows allowed in the load before the COPY operation is canceled. Each row that can't be imported by the COPY operation is ignored and counted as one error. If max_errors isn't specified, the default is 0.
242
+
*MAXERRORS* specifies the maximum number of reject rows allowed in the load before the COPY operation fails. Each row that can't be imported by the COPY operation is ignored and counted as one error. If max_errors isn't specified, the default is 0.
@@ -255,22 +258,22 @@ The COPY command autodetects the compression type based on the file extension wh
255
258
256
259
#### *FIELDQUOTE = 'field_quote'*
257
260
258
-
*FIELDQUOTE* applies to CSV and specifies a single character that is used as the quote character (string delimiter) in the CSV file. If not specified, the quote character (") is used as the quote character as defined in the RFC 4180 standard. Extended ASCII and multi-byte characters and aren't supported with UTF-8 for FIELDQUOTE.
261
+
*FIELDQUOTE* applies to CSV and specifies a single character that is used as the quote character (string delimiter) in the CSV file. If not specified, the quote character (") is used as the quote character as defined in the RFC 4180 standard. Hexadecimal notation is also supported for FIELDQUOTE. Extended ASCII and multi-byte characters aren't supported with UTF-8 for FIELDQUOTE.
259
262
260
263
> [!NOTE]
261
264
> FIELDQUOTE characters are escaped in string columns where there is a presence of a double FIELDQUOTE (delimiter).
262
265
263
266
#### *FIELDTERMINATOR = 'field_terminator'*
264
267
265
-
*FIELDTERMINATOR* Only applies to CSV. Specifies the field terminator that is used in the CSV file. The field terminator can be specified using hexadecimal notation. The field terminator can be multi-character. The default field terminator is a (,). Extended ASCII and multi-byte characters and aren't supported with UTF-8 for FIELDTERMINATOR.
268
+
*FIELDTERMINATOR* Only applies to CSV. Specifies the field terminator that is used in the CSV file. The field terminator can be specified using hexadecimal notation. The field terminator can be multi-character. The default field terminator is a (,). Extended ASCII and multi-byte characters aren't supported with UTF-8 for FIELDTERMINATOR.
266
269
267
270
#### ROW TERMINATOR = 'row_terminator'
268
271
269
272
*ROW TERMINATOR* Only applies to CSV. Specifies the row terminator that is used in the CSV file. The row terminator can be specified using hexadecimal notation. The row terminator can be multi-character. By default, the row terminator is `\r\n`.
270
273
271
274
The COPY command prefixes the `\r` character when specifying `\n` (newline) resulting in `\r\n`. To specify only the `\n` character, use hexadecimal notation (`0x0A`). When specifying multi-character row terminators in hexadecimal, don't specify 0x between each character.
272
275
273
-
Extended ASCII and multi-byte characters and aren't supported with UTF-8 for ROW TERMINATOR.
276
+
Extended ASCII and multi-byte characters aren't supported with UTF-8 for ROW TERMINATOR.
274
277
275
278
#### *FIRSTROW = First_row_int*
276
279
@@ -354,7 +357,7 @@ The default values of the COPY command are:
354
357
355
358
- COMPRESSION default is uncompressed
356
359
357
-
- FIELDQUOTE = ''
360
+
- FIELDQUOTE = '"'
358
361
359
362
- FIELDTERMINATOR = ','
360
363
@@ -549,7 +552,7 @@ Use COPY for the following capabilities:
549
552
- Specify a finer permission model without exposing storage account keys using Share Access Signatures (SAS).
550
553
- Use a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION).
551
554
- Customize default values for each target column and specify source data fields to load into specific target columns.
552
-
- Specify a custom row terminatorfor CSV files.
555
+
- Specify a custom row terminator, field terminator, and field quote for CSV files
553
556
- Specify wildcards and multiple files in the storage location path.
554
557
- For more on data ingestion options and best practices, see [Ingest data into your [!INCLUDE [fabricdw](../../includes/fabric-dw.md)] using the COPY statement](/fabric/data-warehouse/ingest-data-copy).
555
558
@@ -572,6 +575,7 @@ WITH
572
575
[ , ROWTERMINATOR = 'row_terminator' ]
573
576
[ , FIRSTROW = first_row ]
574
577
[ , ENCODING = { 'UTF8' | 'UTF16' } ]
578
+
[ , PARSER_VERSION = { '1.0' | '2.0' } ]
575
579
)
576
580
```
577
581
@@ -583,7 +587,7 @@ Optional if the current warehouse for the user performing the operation is the w
583
587
584
588
#### *schema_name*
585
589
586
-
Optional if the default schema for the user performing the operation is the schema of the specified table. If *schema* isn't specified, and the default schema of the user performing the COPY operation is different from the specified table, COPY fails, and an error message is returned.
590
+
Optional if the default schema for the user performing the operation is the schema of the specified table. If *schema* isn't specified, and the default schema of the user performing the COPY operation is different from the schema of the specified table, COPY is canceled, and an error message is returned.
587
591
588
592
#### *table_name*
589
593
@@ -599,7 +603,7 @@ An optional list of one or more columns used to map source data fields to target
599
603
600
604
-*Column_name* - the name of the column in the target table.
601
605
-*Default_value* - the default value that replaces any NULL value in the input file. Default value applies to all file formats. COPY attempts to load NULL from the input file when a column is omitted from the column list or when there's an empty input file field. Default value is preceded by the keyword 'default'
602
-
-*Field_number* - applies only to CSV. The input file field number that is mapped to the target column name. For Parquet, columns are always bound by name.
606
+
-*Field_number* - The input file field number that is mapped to the target column.
603
607
- The field indexing starts at 1.
604
608
605
609
When *column_list* isn't specified, COPY maps columns based on the source and target order: Input field 1 goes to target column 1, field 2 goes to column 2, etc.
@@ -668,9 +672,12 @@ Multiple file locations can only be specified from the same storage account and
668
672
669
673
*ERRORFILE* only applies to CSV. Specifies the directory where the rejected rows and the corresponding error file should be written. The full path from the storage account can be specified or the path relative to the container can be specified. If the specified path doesn't exist, one is created on your behalf. A child directory is created with the name "\_rejectedrows". The "\_" character ensures that the directory is escaped for other data processing unless explicitly named in the location parameter.
670
674
675
+
> [!NOTE]
676
+
> When a relative path is passed to *ERRORFILE*, the path is relative to the container path specified in *external_location*.
677
+
671
678
Within this directory, there's a folder created based on the time of load submission in the format YearMonthDay -HourMinuteSecond (Ex. 20180330-173205). In this folder a folder with the statement ID is created, and under that folder two types of files are written: an error.Json file containing the reject reasons, and a row.csv file containing the rejected rows.
672
679
673
-
If ERRORFILE has the full path of the storage account defined, then the ERRORFILE_CREDENTIAL is used to connect to that storage. Otherwise, the value mentioned for CREDENTIAL is used.
680
+
If ERRORFILE has the full path of the storage account defined, then the ERRORFILE_CREDENTIAL is used to connect to that storage. Otherwise, the value mentioned for CREDENTIAL is used. When the same credential that is used for the source data is used for ERRORFILE, restrictions that apply to ERRORFILE_CREDENTIAL also apply.
@@ -686,7 +693,7 @@ If ERRORFILE has the full path of the storage account defined, then the ERRORFIL
686
693
687
694
#### *MAXERRORS = max_errors*
688
695
689
-
*MAXERRORS* specifies the maximum number of reject rows allowed in the load before the COPY operation is canceled. Each row that the COPY operation can't import is ignored and counted as one error. If max_errors isn't specified, the default is 0.
696
+
*MAXERRORS* specifies the maximum number of reject rows allowed in the load before the COPY operation fails. Each row that the COPY operation can't import is ignored and counted as one error. If max_errors isn't specified, the default is 0.
@@ -701,22 +708,22 @@ The COPY command autodetects the compression type based on the file extension wh
701
708
702
709
#### *FIELDQUOTE = 'field_quote'*
703
710
704
-
*FIELDQUOTE* only applies to CSV. Specifies a single character that is used as the quote character (string delimiter) in the CSV file. If not specified, the quote character (") is used as the quote character as defined in the RFC 4180 standard. Extended ASCII and multi-byte characters and aren't supported with UTF-8 for FIELDQUOTE.
711
+
*FIELDQUOTE* only applies to CSV. Specifies a single character that is used as the quote character (string delimiter) in the CSV file. If not specified, the quote character (") is used as the quote character as defined in the RFC 4180 standard. Hexadecimal notation is also supported for FIELDQUOTE. Extended ASCII and multi-byte characters aren't supported with UTF-8 for FIELDQUOTE.
705
712
706
713
> [!NOTE]
707
714
> FIELDQUOTE characters are escaped in string columns where there is a presence of a double FIELDQUOTE (delimiter).
708
715
709
716
#### *FIELDTERMINATOR = 'field_terminator'*
710
717
711
-
*FIELDTERMINATOR* only applies to CSV. Specifies the field terminator that is used in the CSV file. The field terminator can also be specified using hexadecimal notation. The field terminator can be multi-character. The default field terminator is a (,). Extended ASCII and multi-byte characters and aren't supported with UTF-8 for FIELDTERMINATOR.
718
+
*FIELDTERMINATOR* only applies to CSV. Specifies the field terminator that is used in the CSV file. The field terminator can also be specified using hexadecimal notation. The field terminator can be multi-character. The default field terminator is a (,). Extended ASCII and multi-byte characters aren't supported with UTF-8 for FIELDTERMINATOR.
712
719
713
-
#### ROW TERMINATOR = 'row_terminator'
720
+
#### ROWTERMINATOR = 'row_terminator'
714
721
715
-
*ROW TERMINATOR* only applies to CSV. Specifies the row terminator that is used in the CSV file. The row terminator can be specified using hexadecimal notation. The row terminator can be multi-character. By default, the row terminator is `\r\n`.
722
+
*ROWTERMINATOR* only applies to CSV. Specifies the row terminator that is used in the CSV file. The row terminator can be specified using hexadecimal notation. The row terminator can be multi-character. The default terminators are `\r\n`, `\n`, and `\r`.
716
723
717
724
The COPY command prefixes the `\r` character when specifying `\n` (newline) resulting in `\r\n`. To specify only the `\n` character, use hexadecimal notation (`0x0A`). When specifying multi-character row terminators in hexadecimal, don't specify 0x between each character.
718
725
719
-
Extended ASCII and multi-byte characters and aren't supported with UTF-8 for ROW TERMINATOR.
726
+
Extended ASCII and multi-byte characters aren't supported with UTF-8 for ROWTERMINATOR.
720
727
721
728
#### *FIRSTROW = First_row_int*
722
729
@@ -726,6 +733,23 @@ Extended ASCII and multi-byte characters and aren't supported with UTF-8 for ROW
726
733
727
734
ENCODING only applies to CSV. Default is UTF8. Specifies the data encoding standard for the files loaded by the COPY command.
728
735
736
+
#### PARSER_VERSION = { '1.0' | '2.0' }
737
+
738
+
PARSER_VERSION only applies to CSV. Default is 2.0. Specifies the file parser used for ingestion when the source file type is CSV. The 2.0 parser offers improved performance for ingestion of CSV files.
739
+
740
+
Parser version 2.0 has the following limitations:
741
+
742
+
- Compressed CSV files are not supported
743
+
- Files with UTF-16 encoding are not supported
744
+
- Multicharacter or multibyte ROWTERMINATOR, FIELDTERMINATOR, or FIELDQUOTE is not supported. However, '\r\n' is accepted as a default ROWTERMINATOR
745
+
746
+
When using parser version 1.0 with UTF-8 files, multibyte and multicharacter terminators are not supported for FIELDTERMINATOR.
747
+
748
+
Parser version 1.0 is available for backward compatibility only, and should be used only when these limitations are encountered.
749
+
750
+
> [!NOTE]
751
+
> When COPY INTO is used with compressed CSV files or files with UTF-16 encoding, COPY INTO automatically switches to PARSER_VERSION 1.0, without user action required. For multi-character terminators on FIELDTERMINATOR or ROWTERMINATOR, the COPY INTO statement will fail. Use PARSER_VERSION = '1.0' if multi-character separators are needed.
752
+
729
753
## Remarks
730
754
731
755
COPY INTO in [!INCLUDE [fabricdw](../../includes/fabric-dw.md)] doesn't allow setting a date format for interpreting date character strings. By default, all dates are considered to have the month-day-year format. To ingest a CSV file with a different date format, use *SET DATEFORMAT* to specify the desired date format at the session level. For more information, see [SET DATEFORMAT (Transact-SQL)](set-dateformat-transact-sql.md).
@@ -751,7 +775,7 @@ The default values of the COPY command are:
751
775
752
776
- COMPRESSION default is uncompressed
753
777
754
-
- FIELDQUOTE = ''
778
+
- FIELDQUOTE = '"'
755
779
756
780
- FIELDTERMINATOR = ','
757
781
@@ -838,11 +862,11 @@ WITH (
838
862
839
863
### What is the file splitting guidance for the COPY command loading compressed CSV files?
840
864
841
-
Consider splitting large CSV files, but keep files at a minimum of 4 MB each for better performance.
865
+
Consider splitting large CSV files, especially when the number of files is small, but keep files at a minimum of 4 MB each for better performance.
842
866
843
867
### What is the file splitting guidance for the COPY command loading Parquet files?
844
868
845
-
There's no need to split Parquet because the COPY command splits files automatically. Parquet files in the Azure storage account should be 256 MB or larger for best performance.
869
+
Consider splitting large Parquet files, especially when the number of files is small.
846
870
847
871
### Are there any limitations on the number or size of files?
0 commit comments