Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Commit ce78845

Browse files
committed
Updating for MI
1 parent c60ac15 commit ce78845

3 files changed

Lines changed: 70 additions & 57 deletions

File tree

docs/machine-learning/r/converting-r-code-for-use-in-sql-server.md

Lines changed: 55 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ description: Migrate R code to a SQL Server stored procedure for solution deploy
44
ms.prod: sql
55
ms.technology: machine-learning-services
66

7-
ms.date: 08/28/2020
7+
ms.date: 10/06/2020
88
ms.topic: how-to
99
author: dphansen
1010
ms.author: davidph
1111
ms.custom: seo-lt-2019
12-
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions"
12+
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||=azuresqldb-mi-current||=sqlallproducts-allversions"
1313
---
1414
# Convert R code for execution in SQL Server (In-Database) instances
1515
[!INCLUDE [SQL Server 2016 and later](../../includes/applies-to-version/sqlserver2016.md)]
@@ -24,41 +24,41 @@ However, your code might require substantial changes if any of the following app
2424
+ The code makes separate calls to data sources outside SQL Server, such as Excel worksheets, files on shares, and other databases.
2525
+ You want to run the code in the *\@script* parameter of [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md) and also parameterize the stored procedure.
2626
+ Your original solution includes multiple steps that might be more efficient in a production environment if executed independently, such as data preparation or feature engineering vs. model training, scoring, or reporting.
27-
+ You want to improve optimize performance by changing libraries, using parallel execution, or offloading some processing to SQL Server.
27+
+ You want to optimize performance by changing libraries, using parallel execution, or offloading some processing to SQL Server.
2828

2929
## Step 1. Plan requirements and resources
3030

31-
**Packages**
31+
### Packages
3232

3333
+ Determine which packages are needed and ensure that they work on SQL Server.
3434

3535
+ Install packages in advance, in the default package library used by Machine Learning Services. User libraries are not supported.
3636

37-
**Data sources**
37+
### Data sources
3838

3939
+ If you intend to embed your R code in [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md), identify primary and secondary data sources.
4040

41-
+ **Primary** data sources are large datasets, such as model training data, or input data for predictions. Plan to map your largest dataset to the input parameter of [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md).
41+
+ **Primary** data sources are large datasets, such as model training data, or input data for predictions. Plan to map your largest dataset to the input parameter of [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md).
4242

43-
+ **Secondary** data sources are typically smaller data sets, such as lists of factors, or additional grouping variables.
44-
45-
Currently, sp_execute_external_script supports only a single dataset as input to the stored procedure. However, you can add multiple scalar or binary inputs.
43+
+ **Secondary** data sources are typically smaller data sets, such as lists of factors, or additional grouping variables.
44+
45+
Currently, sp_execute_external_script supports only a single dataset as input to the stored procedure. However, you can add multiple scalar or binary inputs.
4646

47-
Stored procedure calls preceded by EXECUTE cannot be used as an input to [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md). You can use queries, views, or any other valid SELECT statement.
47+
Stored procedure calls preceded by EXECUTE cannot be used as an input to [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md). You can use queries, views, or any other valid SELECT statement.
4848

4949
+ Determine the outputs you need. If you run R code using sp_execute_external_script, the stored procedure can output just one data frame as a result. However, you can also output multiple scalar outputs, including plots and models in binary format, as well as other scalar values derived from R code or SQL parameters.
5050

51-
**Data types**
51+
### Data types
5252

5353
+ Make a checklist of possible data type issues.
5454

55-
All R data types are supported by SQL Server machine Learning Services. However, [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] supports a greater variety of data types than does R. Therefore, some implicit data type conversions are performed when sending [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] data to R, and vice versa. You might need to explicitly cast or convert some data.
55+
All R data types are supported by SQL Server machine Learning Services. However, [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] supports a greater variety of data types than does R. Therefore, some implicit data type conversions are performed when sending [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] data to R, and vice versa. You might need to explicitly cast or convert some data.
5656

57-
NULL values are supported. However, R uses the `na` data construct to represent a missing value, which is similar to a null.
57+
NULL values are supported. However, R uses the `na` data construct to represent a missing value, which is similar to a null.
5858

5959
+ Consider eliminating dependency on data that cannot be used by R: for example, rowid and GUID data types from SQL Server cannot be consumed by R and generate errors.
6060

61-
For more information, see [R Libraries and Data Types](../r/r-libraries-and-data-types.md).
61+
For more information, see [R Libraries and Data Types](../r/r-libraries-and-data-types.md).
6262

6363
## Step 2. Convert or repackage code
6464

@@ -68,99 +68,103 @@ How much you change your code depends on whether you intend to submit the R code
6868

6969
+ When running R in a stored procedure, you can pass through multiple **scalar** inputs. For any parameters that you want to use in the output, add the **OUTPUT** keyword.
7070

71-
For example, the following scalar input `@model_name` contains the model name, which is also output in its own column in the results:
71+
For example, the following scalar input `@model_name` contains the model name, which is also output in its own column in the results:
7272

73-
```sql
74-
EXEC sp_execute_external_script @model_name="DefaultModel" OUTPUT, @language=N'R', @script=N'R code here'
75-
```
73+
```sql
74+
EXEC sp_execute_external_script @model_name="DefaultModel" OUTPUT, @language=N'R', @script=N'R code here'
75+
```
7676

7777
+ Any variables that you pass in as parameters of the stored procedure [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md) must be mapped to variables in the R code. By default, variables are mapped by name.
7878

79-
All columns in the input dataset must also be mapped to variables in the R script. For example, assume your R script contains a formula like this one:
79+
All columns in the input dataset must also be mapped to variables in the R script. For example, assume your R script contains a formula like this one:
8080

81-
```R
82-
formula <- ArrDelay ~ CRSDepTime + DayOfWeek + CRSDepHour:DayOfWeek
83-
```
84-
85-
An error is raised if the input dataset does not contain columns with the matching names ArrDelay, CRSDepTime, DayOfWeek, CRSDepHour, and DayOfWeek.
81+
```R
82+
formula <- ArrDelay ~ CRSDepTime + DayOfWeek + CRSDepHour:DayOfWeek
83+
```
84+
85+
An error is raised if the input dataset does not contain columns with the matching names ArrDelay, CRSDepTime, DayOfWeek, CRSDepHour, and DayOfWeek.
8686

8787
+ In some cases, an output schema must be defined in advance for the results.
8888

89-
For example, to insert the data into a table, you must use the **WITH RESULT SET** clause to specify the schema.
89+
For example, to insert the data into a table, you must use the **WITH RESULT SET** clause to specify the schema.
9090

91-
The output schema is also required if the R script uses the argument `@parallel=1`. The reason is that multiple processes might be created by SQL Server to run the query in parallel, with the results collected at the end. Therefore, the output schema must be prepared before the parallel processes can be created.
92-
93-
In other cases, you can omit the result schema by using the option **WITH RESULT SETS UNDEFINED**. This statement returns the dataset from the R script without naming the columns or specifying the SQL data types.
91+
The output schema is also required if the R script uses the argument `@parallel=1`. The reason is that multiple processes might be created by SQL Server to run the query in parallel, with the results collected at the end. Therefore, the output schema must be prepared before the parallel processes can be created.
92+
93+
In other cases, you can omit the result schema by using the option **WITH RESULT SETS UNDEFINED**. This statement returns the dataset from the R script without naming the columns or specifying the SQL data types.
9494

9595
+ Consider generating timing or tracking data using T-SQL rather than R.
9696

97-
For example, you could pass the system time or other information used for auditing and storage by adding a T-SQL call that is passed through to the results, rather than generating similar data in the R script.
97+
For example, you could pass the system time or other information used for auditing and storage by adding a T-SQL call that is passed through to the results, rather than generating similar data in the R script.
9898

99-
**Improve performance and security**
99+
### Improve performance and security
100100

101+
::: moniker range=">=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions"
101102
+ Avoid writing predictions or intermediate results to file. Write predictions to a table instead, to avoid data movement.
103+
::: moniker-end
102104

103105
+ Run all queries in advance, and review the SQL Server query plans to identify tasks that can be performed in parallel.
104106

105-
If the input query can be parallelized, set `@parallel=1` as part of your arguments to [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md).
107+
If the input query can be parallelized, set `@parallel=1` as part of your arguments to [sp_execute_external_script](../../relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql.md).
106108

107-
Parallel processing with this flag is typically possible any time that SQL Server can work with partitioned tables or distribute a query among multiple processes and aggregate the results at the end. Parallel processing with this flag is typically not possible if you are training models using algorithms that require all data to be read, or if you need to create aggregates.
109+
Parallel processing with this flag is typically possible any time that SQL Server can work with partitioned tables or distribute a query among multiple processes and aggregate the results at the end. Parallel processing with this flag is typically not possible if you are training models using algorithms that require all data to be read, or if you need to create aggregates.
108110

109111
+ Review your R code to determine if there are steps that can be performed independently, or performed more efficiently, by using a separate stored procedure call. For example, you might get better performance by doing feature engineering or feature extraction separately, and saving the values to a table.
110112

111113
+ Look for ways to use T-SQL rather than R code for set-based computations.
112114

113-
For example, this R solution shows how user-defined T-SQL functions and R can perform the same feature engineering task: [Data Science End-to-End Walkthrough](../tutorials/walkthrough-data-science-end-to-end-walkthrough.md).
115+
::: moniker range=">=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions"
116+
For example, this R solution shows how user-defined T-SQL functions and R can perform the same feature engineering task: [Data Science End-to-End Walkthrough](../tutorials/walkthrough-data-science-end-to-end-walkthrough.md).
117+
::: moniker-end
114118

115119
+ If possible, replace conventional R functions with **ScaleR** functions that support distributed execution. For more information, see [Comparison of Base R and Scale R Functions](https://docs.microsoft.com/machine-learning-server/r-reference/revoscaler/revoscaler-compared-to-base-r).
116120

117121
+ Consult with a database developer to determine ways to improve performance by using SQL Server features such as [memory-optimized tables](https://docs.microsoft.com/sql/relational-databases/in-memory-oltp/introduction-to-memory-optimized-tables), or, if you have Enterprise Edition, [Resource Governor](https://docs.microsoft.com/sql/relational-databases/resource-governor/resource-governor)).
118122

119-
120-
### Step 3. Prepare for deployment
123+
## Step 3. Prepare for deployment
121124

122125
+ Notify the administrator so that packages can be installed and tested in advance of deploying your code.
123126

124-
In a development environment, it might be okay to install packages as part of your code, but this is a bad practice in a production environment.
127+
In a development environment, it might be okay to install packages as part of your code, but this is a bad practice in a production environment.
125128

126-
User libraries are not supported, regardless of whether you are using a stored procedure or running R code in the SQL Server compute context.
129+
User libraries are not supported, regardless of whether you are using a stored procedure or running R code in the SQL Server compute context.
127130

128-
**Package your R code in a stored procedure**
131+
### Package your R code in a stored procedure
129132

130133
+ If your code is relatively simple, you can embed it in a T-SQL user-defined function without modification, as described in this samples:
131134

132-
+ [Feature engineering using T-SQL and R](../tutorials/r-taxi-classification-create-features.md)
135+
+ [Feature engineering using T-SQL and R](../tutorials/r-taxi-classification-create-features.md)
133136

134137
+ If the code is more complex, use the R package **sqlrutils** to convert your code. This package is designed to help experienced R users write good stored procedure code.
135138

136-
The first step is to rewrite your R code as a single function with clearly defined inputs and outputs.
139+
The first step is to rewrite your R code as a single function with clearly defined inputs and outputs.
137140

138-
Then, use the **sqlrutils** package to generate the input and outputs in the correct format. The **sqlrutils** package generates the complete stored procedure code for you, and can also register the stored procedure in the database.
141+
Then, use the **sqlrutils** package to generate the input and outputs in the correct format. The **sqlrutils** package generates the complete stored procedure code for you, and can also register the stored procedure in the database.
139142

140-
For more information and examples, see [sqlrutils (SQL)](ref-r-sqlrutils.md).
143+
For more information and examples, see [sqlrutils (SQL)](ref-r-sqlrutils.md).
141144

142-
**Integrate with other workflows**
145+
### Integrate with other workflows
143146

144147
+ Leverage T-SQL tools and ETL processes. Perform feature engineering, feature extraction, and data cleansing in advance as part of data workflows.
145148

146-
When you are working in a dedicated R development environment such as [!INCLUDE[rsql_rtvs_md](../../includes/rsql-rtvs-md.md)] or RStudio, you might pull data to your computer, analyze the data iteratively, and then write out or display the results.
147-
148-
However, when standalone R code is migrated to SQL Server, much of this process can be simplified or delegated to other SQL Server tools.
149+
When you are working in a dedicated R development environment such as [!INCLUDE[rsql_rtvs_md](../../includes/rsql-rtvs-md.md)] or RStudio, you might pull data to your computer, analyze the data iteratively, and then write out or display the results.
150+
151+
However, when standalone R code is migrated to SQL Server, much of this process can be simplified or delegated to other SQL Server tools.
149152

150153
+ Use secure, asynchronous visualization strategies.
151154

152-
Users of SQL Server often cannot access files on the server, and SQL client tools typically do not support the R graphics device. If you generate plots or other graphics as part of the solution, consider exporting the plots as binary data and saving to a table, or writing.
155+
Users of SQL Server often cannot access files on the server, and SQL client tools typically do not support the R graphics device. If you generate plots or other graphics as part of the solution, consider exporting the plots as binary data and saving to a table, or writing.
153156

154157
+ Wrap prediction and scoring functions in stored procedures for direct access by applications.
155158

156-
### Other resources
159+
## Next steps
157160

158161
To view examples of how an R solution can be deployed in SQL Server, see these samples:
159162

160-
+ [Build a predictive model for ski rental business using R and SQL Server](https://microsoft.github.io/sql-ml-tutorials/R/rentalprediction/)
163+
+ [Tutorial: Develop a predictive model in R with SQL machine learning](../tutorials/r-predictive-model-introduction.md)
161164

162-
+ [In-Database Analytics for the SQL Developer](../tutorials/r-taxi-classification-introduction.md)
163-
Demonstrates how you can make your R code more modular by wrapping it in stored procedures
165+
+ [R tutorial: Predict NYC taxi fares with binary classification](../tutorials/r-taxi-classification-introduction.md)
164166

167+
::: moniker range=">=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions"
165168
+ [End-to-End Data Science Solution](../tutorials/walkthrough-data-science-end-to-end-walkthrough.md)
166169
Includes a comparison of feature engineering in R and T-SQL
170+
::: moniker-end

docs/machine-learning/r/how-to-create-a-stored-procedure-using-sqlrutils.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ description: Use the sqlrutils R package in SQL Server to bundle R language code
44
ms.prod: sql
55
ms.technology: machine-learning-services
66

7-
ms.date: 08/31/2020
7+
ms.date: 10/06/2020
88
ms.topic: how-to
99
author: dphansen
1010
ms.author: davidph
1111
ms.custom: seo-lt-2019
12-
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions"
12+
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||=azuresqldb-mi-current||=sqlallproducts-allversions"
1313
---
1414
# Create a stored procedure using sqlrutils
1515
[!INCLUDE [SQL Server 2016 and later](../../includes/applies-to-version/sqlserver2016.md)]

0 commit comments

Comments
 (0)