You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article explains how to use stored procedures in two important SQL Server features, SQL Server Integration Services (SSIS), and SQL Server Reporting Services SSRS, in way that combines relational data, data science functions from the Microsoft R libraries, and BI features for coordinated data transformations and visualization. Learn which capabilities of [!INCLUDE[ssISnoversion](../../includes/ssisnoversion-md.md)] lend themselves to a data science solution. This article also reminds you that code and data on SQL Server, such as the embedded R code in stored procedures, is easily consumed in visualizations provided in [!INCLUDE[ssRSnoversion](../../includes/ssrsnoversion-md.md)].
16
+
This article explains how to use embedded R and Python script using the language and data science capabilities of SQL Server Machine Learning Services with two important SQL Server features: SQL Server Integration Services (SSIS) and SQL Server Reporting Services SSRS. R and Python libraries in SQL Server provide statistical and predictive functions. SSIS and SSRS provide coordinated ETL transformation and visualizations, respectively. This article explains how to put all of these features together in this workflow pattern:
17
17
18
-
## Bring compute power to the data
18
+
> [!div class="checklist"]
19
+
> * Create a stored procedure that contains executable R or Python
20
+
> * Execute the stored procedure from SSIS or SSRS
19
21
20
-
A key design goal of integrating R and Python with SQL Server has been to bring analytics close to the data. This provides multiple advantages:
21
-
22
-
+ Data security. Bringing R closer to the source of data avoids wasteful or insecure data movement.
23
-
+ Speed. Databases are optimized for set-based operations. Recent innovations in databases such as in-memory tables make summaries and aggregations lightning, and are a perfect complement to data science.
24
-
+ Ease of deployment and integration. [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] is the central point of operations for many other data management tasks and applications. By using data that resides in the database or reporting warehouse, you ensure that the data used by machine learning solutions is consistent and up-to-date.
25
-
+ Efficiency across cloud and on-premises. Rather than process data in R, you can rely on enterprise data pipelines including [!INCLUDE[ssISnoversion](../../includes/ssisnoversion-md.md)] and Azure Data Factory. Reporting of results or analyses is easy via Power BI or [!INCLUDE[ssRSnoversion](../../includes/ssrsnoversion-md.md)].
26
-
27
-
By using the right combination of SQL and R for different data processing and analytical tasks, both data scientists and developers can be more productive.
22
+
The examples in this article are mostly about R and SSIS, but the concepts and steps apply equally to Python. The second section provides guidance and links for SSRS visualizations.
28
23
29
24
<aname="bkmk_ssis"></a>
30
25
31
-
## Use SSIS for data transformation and automation
26
+
## Use SSIS for automation
32
27
33
28
Data science workflows are highly iterative and involve much transformation of data, including scaling, aggregations, computation of probabilities, and renaming and merging of attributes. Data scientists are accustomed to doing many of these tasks in R, Python, or another language; however, executing such workflows on enterprise data requires seamless integration with ETL tools and processes.
34
29
35
-
Because [!INCLUDE[rsql_productname](../../includes/rsql-productname-md.md)] enables you to run complex operations in R via Transact-SQL and stored procedures, you can integrate R-specific tasks with existing ETL processes without minimal re-development work. Rather than perform a chain of memory-intensive tasks in R, data preparation can be optimized using the most efficient tools, including [!INCLUDE[ssISnoversion](../../includes/ssisnoversion-md.md)] and [!INCLUDE[tsql](../../includes/tsql-md.md)].
30
+
Because [!INCLUDE[rsql_productname](../../includes/rsql-productname-md.md)] enables you to run complex operations in R via Transact-SQL and stored procedures, you can integrate data science tasks with existing ETL processes. Rather than perform a chain of memory-intensive tasks, data preparation can be optimized using the most efficient tools, including [!INCLUDE[ssISnoversion](../../includes/ssisnoversion-md.md)] and [!INCLUDE[tsql](../../includes/tsql-md.md)].
36
31
37
32
Here are some ideas for how you can automate your data processing and modeling pipelines using [!INCLUDE[ssISnoversion](../../includes/ssisnoversion-md.md)]:
38
33
39
-
+ Use [!INCLUDE[ssISnoversion](../../includes/ssisnoversion-md.md)] tasks to create necessary data features in the SQL database
40
-
+ Use conditional branching to switch compute context for R jobs
41
-
+ Run R jobs that generate their own data in the database, and share that data with applications
42
-
+ When using [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)], load R script saved in a text variable and run it in SQL Server
34
+
+ Extract data from on premises or cloud sources to build training data
35
+
+ Build and run R or Python models as part of a data integration workflow
36
+
+ Retrain models on a regular (scheduled) basis
37
+
+ Load results from R or Python script to other destinations such as Excel, Power BI, Oracle, and Teradata, to name a few
38
+
+ Use SSIS tasks to create data features in the SQL database
39
+
+ Use conditional branching to switch compute context for R and Python jobs
43
40
44
41
## SSIS example
45
42
46
-
The following example originates from a now-retired MSDN blog post authored by Jimmy Wong at this URL: `https://blogs.msdn.microsoft.com/ssis/2016/01/11/operationalize-your-machine-learning-project-using-sql-server-2016-ssis-and-r-services/`.
43
+
The following example originates from a now-retired MSDN blog post authored by Jimmy Wong at this URL: `https://blogs.msdn.microsoft.com/ssis/2016/01/11/operationalize-your-machine-learning-project-using-sql-server-2016-ssis-and-r-services/`
47
44
48
45
This example shows you how to automate tasks using SSIS. You create stored procedures with embedded R using SQL Server Management Studio, and then execute those stored procedures from [Execute T-SQL tasks](https://docs.microsoft.com/sql/integration-services/control-flow/execute-t-sql-statement-task) in an SSIS package.
49
46
50
47
To step through this example, you should be familiar with Management Studio, SSIS, SSIS Designer, package design, and T-SQL. The SSIS package uses three [Execute T-SQL tasks](https://docs.microsoft.com/sql/integration-services/control-flow/execute-t-sql-statement-task) that insert training data into a table, model the data, and score the data to get prediction output.
51
48
52
-
### Create tables
49
+
### Load training data
53
50
54
-
Run the following script in SQL Server Management Studio to create a few tables: one to store the data and another to store a model. The role of the ssis_iris table is to act as training data in an operationalization scenario.
51
+
Run the following script in SQL Server Management Studio to create a table for storing the data. You should create and use a test database for this exercise.
55
52
56
53
```T-SQL
57
-
Use irissql
54
+
Use test-db
58
55
GO
59
56
60
57
Create table ssis_iris (
@@ -64,17 +61,9 @@ Create table ssis_iris (
64
61
, "Species" varchar(100) null
65
62
);
66
63
GO
67
-
68
-
Create table ssis_iris_models (
69
-
model_name varchar(30) not null default('default model') primary key,
70
-
model varbinary(max) not null
71
-
);
72
-
GO
73
64
```
74
65
75
-
### Create a stored procedure that loads training data
76
-
77
-
This script creates a stored procedure that loads Iris into a data frame using the built-in R data set.
66
+
Create a stored procedure that loads training data into data frame. This example is using the built-in Iris data set.
78
67
79
68
```T-SQL
80
69
Create procedure load_iris
@@ -89,23 +78,32 @@ begin
89
78
end;
90
79
```
91
80
92
-
### Define an Execute SQL task that refreshes the model
93
-
94
-
In SSIS Designer, create an [Execute SQL task](https://docs.microsoft.com/sql/integration-services/control-flow/execute-sql-task).
The script for SQLStatement is as follows. The script removes existing data and then reloads new data using the **load_iris** stored procedure you created in the previous step.
81
+
In SSIS Designer, create an [Execute SQL task](https://docs.microsoft.com/sql/integration-services/control-flow/execute-sql-task) that executes the stored procedure you just defined. The script for **SQLStatement** removes existing data, specifies which data to insert, and then calls the stored procedure to provide the data.
99
82
100
83
```T-SQL
101
84
truncate table ssis_iris;
102
85
insert into ssis_iris("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")
103
86
exec dbo.load_iris;
104
87
```
105
88
106
-
### Create a stored procedure that generates a model
Run the following script in SQL Server Management Studio to create a table that stores a model.
94
+
95
+
```T-SQL
96
+
Use test-db
97
+
GO
98
+
99
+
Create table ssis_iris_models (
100
+
model_name varchar(30) not null default('default model') primary key,
101
+
model varbinary(max) not null
102
+
);
103
+
GO
104
+
```
107
105
108
-
This stored procedure is code that creates a linear model using [rxLinMod](https://docs.microsoft.com/machine-learning-server/r-reference/revoscaler/rxlinmod). RevoScaleR and revoscalepy libraries are loaded automatically in R and Python sessions on SQL Server.
106
+
Create a stored procedure that generates a linear model using [rxLinMod](https://docs.microsoft.com/machine-learning-server/r-reference/revoscaler/rxlinmod). RevoScaleR and revoscalepy libraries are automatically available in R and Python sessions on SQL Server so there is no need to import the library.
109
107
110
108
```T-SQL
111
109
Create procedure generate_iris_rx_model
@@ -124,25 +122,23 @@ end;
124
122
GO
125
123
```
126
124
127
-
### Define an Execute SQL task that runs the model-generation stored procedure
128
-
129
-
In this step, [Execute SQL task](https://docs.microsoft.com/sql/integration-services/control-flow/execute-sql-task) executes the **generate_iris_rx_model** stored procedure, creating the model and inserting it into the ssis_iris_models table.
130
-
131
-

125
+
In SSIS Designer, create an [Execute SQL task](https://docs.microsoft.com/sql/integration-services/control-flow/execute-sql-task) to execute the **generate_iris_rx_model** stored procedure. The model is serialized and saved to the ssis_iris_models table. The script for **SQLStatement** is as follows:
132
126
133
127
```T-SQL
134
128
insert into ssis_iris_models (model)
135
129
exec generate_iris_rx_model;
136
130
update ssis_iris_models set model_name = 'rxLinMod' where model_name = 'default model';
137
131
```
138
132
139
-
After this task completes, you can query the ssis_iris_models to see that it contains one binary model.
133
+

134
+
135
+
As a checkpoint, after this task completes, you can query the ssis_iris_models to see that it contains one binary model.
140
136
141
137
### Predict (score) outcomes using the "trained" model
142
138
143
-
In this simplistic example, the assumption is that ssis_iris_model is a trained model. Since the purpose of a trained model is to generate predictions, we are now ready to run a prediction using it.
139
+
Now that you have code that loads training data and generates a model, the only step left is using the model to generate predictions.
144
140
145
-
To do this, put the R script in the SQL query to trigger the [rxPredict](https://docs.microsoft.com//machine-learning-server/r-reference/revoscaler/rxpredict) built-in R function on ssis_iris_model. A stored procedure in SQL Server called **predict_species_length** accomplishes this task.
141
+
To do this, put the R script in the SQL query to trigger the [rxPredict](https://docs.microsoft.com//machine-learning-server/r-reference/revoscaler/rxpredict) built-in R function on ssis_iris_model. A stored procedure called **predict_species_length** accomplishes this task.
### Define an Execute SQL task that predicts outcomes
174
-
175
-
Using [Execute SQL task](https://docs.microsoft.com/sql/integration-services/control-flow/execute-sql-task), execute the **predict_species_length** stored procedure to generate predicted petal length.
In SSIS Designer, create an [Execute SQL task](https://docs.microsoft.com/sql/integration-services/control-flow/execute-sql-task) that executes the **predict_species_length** stored procedure to generate predicted petal length.
Real-time scoring uses the [sp_rxPredict](https://docs.microsoft.com//sql/relational-databases/system-stored-procedures/sp-rxpredict-transact-sql) system stored procedure and the CLR extension capabilities in SQL Server for high-performance predictions or scores in forecasting workloads. Real-time scoring is language-agnostic and executes with no dependencies on R or Python run times. Assuming a model created and trained using Microsoft functions, and then serialized to a binary format in SQL Server, you can use real-time scoring to generate predicted outcomes on new data inputs on SQL Server instances that do not have the R or Python add-on installed.
18
17
19
18
## How real-time scoring works
20
19
21
-
Real-time scoring is supported in both SQL Server 2017 and SQL Server 2016, on [supported model types](#bkmk_py_supported_algos) for linear and logistic regression and decision tree modeling. It uses native C++ libraries to generate scores, based on user input provided to a machine learning model stored in a special binary format.
20
+
Real-time scoring is supported in both SQL Server 2017 and SQL Server 2016, on specific model types based on RevoScaleR or MicrosoftML functions such as [rxLinMod (RevoScaleR)](https://docs.microsoft.com/machine-learning-server/r-reference/revoscaler/rxlinmod)[rxNeuralNet (MicrosoftML)](https://docs.microsoft.com/machine-learning-server/r-reference/microsoftml/rxneuralnet). It uses native C++ libraries to generate scores, based on user input provided to a machine learning model stored in a special binary format.
22
21
23
22
Because a trained model can be used for scoring without having to call an external language runtime, the overhead of multiple processes is reduced. This supports much faster prediction performance for production scoring scenarios. Because the data never leaves SQL Server, results can be generated and inserted into a new table without any data translation between R and SQL.
Copy file name to clipboardExpand all lines: docs/advanced-analytics/what-is-sql-server-machine-learning.md
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,16 @@ If you previously used [SQL Server 2016 R Services](r/sql-server-r-services.md),
19
19
20
20
In Azure SQL Database, [Machine Learning Services (with R)](https://docs.microsoft.com/azure/sql-database/sql-database-machine-learning-services-overview) is currently in public preview.
21
21
22
-
The key value proposition of Machine Learning Services is the power of its enterprise R and Python packages to deliver advanced analytics at scale, and the ability to bring calculations and processing to where the data resides, eliminating the need to pull data across the network.
22
+
## Bring compute power to the data
23
+
24
+
The key value proposition of Machine Learning Services is the power of its enterprise R and Python packages to deliver advanced analytics at scale, and the ability to bring calculations and processing to where the data resides, eliminating the need to pull data across the network. This provides multiple advantages:
25
+
26
+
+ Data security. Bringing R and Python execution closer to the source of data avoids wasteful or insecure data movement.
27
+
+ Speed. Databases are optimized for set-based operations. Recent innovations in databases such as in-memory tables make summaries and aggregations lightning, and are a perfect complement to data science.
28
+
+ Ease of deployment and integration. [!INCLUDE[ssNoVersion](../includes/ssnoversion-md.md)] is the central point of operations for many other data management tasks and applications. By using data that resides in the database or reporting warehouse, you ensure that the data used by machine learning solutions is consistent and up-to-date.
29
+
+ Efficiency across cloud and on-premises. Rather than process data in R or Python sessions, you can rely on enterprise data pipelines including [!INCLUDE[ssISnoversion](../includes/ssisnoversion-md.md)] and Azure Data Factory. Reporting of results or analyses is easy via Power BI or [!INCLUDE[ssRSnoversion](../includes/ssrsnoversion-md.md)].
30
+
31
+
By using the right combination of SQL and R for different data processing and analytical tasks, both data scientists and developers can be more productive.
0 commit comments