You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/advanced-analytics/r/how-to-do-realtime-scoring.md
+27-49Lines changed: 27 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,86 +32,64 @@ The significance of CLR and C++ extensions is proximity to the database engine i
32
32
33
33
As you might expect, platform support is impacted by these run time environments. Native database engine extensions run anywhere the relational database is supported: Windows, Linux, Azure. CLR extensions with the .NET Core requirement is currently Windows only.
34
34
35
-
## Native and real-time scoring compared
36
-
37
-
Starting in SQL Server 2016, Microsoft added an extensibility framework that allows R scripts to be executed from T-SQL. This framework supports any operation you might perform in R, ranging from simple functions to training complex machine learning models. However, the dual-process architecture requires invoking an external R process for every call, regardless of the complexity of the operation. If you are loading a pre-trained model from a table and scoring against it on data already in SQL Server, the overhead of calling the external R process represents an unnecessary performance cost.
38
-
39
-
_Scoring_ is a two-step process. First, you specify an already trained model to load from a table. Second, pass new input data to the function, to generate prediction values (or _scores_). The input can be either tabular or single rows. You can choose to output a single column value representing a probability, or you might output several values, such as a confidence interval, error, or other useful complement to the prediction.
35
+
## Scoring overview
40
36
41
-
When the input includes many rows of data, it is usually faster to insert the prediction values into a table as part of the scoring process. Generating a single score is more typical in a scenario where you get input values from a form or user request, and return the score to a client application. To improve performance when generating successive scores, SQL Server might cache the model so that it can be reloaded into memory.
42
-
43
-
To support fast scoring, SQL Server Machine Learning Services (and Microsoft Machine Learning Server) provide built-in scoring libraries that work in R or in T-SQL. There are different options depending on which version you have.
37
+
_Scoring_ is a two-step process. First, you specify an already trained model to load from a table. Second, pass new input data to the function, to generate prediction values (or _scores_). The input is often a T-SQL query, returning either tabular or single rows. You can choose to output a single column value representing a probability, or you might output several values, such as a confidence interval, error, or other useful complement to the prediction.
44
38
45
-
**Native scoring**
39
+
Taking a step back, the overall process of preparing the model and then generating scores can be summarized this way:
46
40
47
-
+ The PREDICT function in Transact-SQL supports _native scoring_ in any instance of SQL Server 2017. It requires only that you have a model already trained, which you can call using T-SQL. Native scoring using T-SQL has these advantages:
41
+
1. Create a model using a supported algorithm.
42
+
2. Serialize the model using a special binary format.
43
+
3. Make the model available to SQL Server. Typically this means storing the serialized model in a SQL Server table.
44
+
4. Call the function or stored procedure, specifying the model and input data as parameters.
48
45
49
-
+ No additional configuration is required.
50
-
+ The R runtime is not called. There is no need to install R.
46
+
When the input includes many rows of data, it is usually faster to insert the prediction values into a table as part of the scoring process. Generating a single score is more typical in a scenario where you get input values from a form or user request, and return the score to a client application. To improve performance when generating successive scores, SQL Server might cache the model so that it can be reloaded into memory.
51
47
52
-
**Real-time scoring**
48
+
## Native and real-time scoring compared
53
49
54
-
+**sp_rxPredict**is a stored procedure for real-time scoring that can be used to generates scores from any supported model type, without calling the R runtime.
50
+
To preserve the integrity of core database engine processes, support for R and Python is enabled in a dual architecture that isolates language processing from RDBMS processing. Starting in SQL Server 2016, Microsoft added an extensibility framework that allows R scripts to be executed from T-SQL. In SQL Server 2017, Python integration was added.
55
51
56
-
This stored procedure is also available in SQL Server 2016, if you upgrade the R components using the standalone installer of Microsoft R Server. sp_rxPredict is also supported in SQL Server 2017. Therefore, you might use this function when generating scores with a model type not supported by the PREDICT function.
52
+
The extensibility framework supports any operation you might perform in R or Python, ranging from simple functions to training complex machine learning models. However, the dual-process architecture requires invoking an external R or Python process for every call, regardless of the complexity of the operation. When the workload entails loading a pre-trained model from a table and scoring against it on data already in SQL Server, the overhead of calling the external processes adds latency that can be unacceptable in certain circumstances. For example, in a request-response pattern such as fraud detection, scores must be generated very quickly in order to be relevant.
57
53
58
-
+ The rxPredict function can be used for fast scoring within R code.
54
+
To support fast scoring, SQL Server added built-in scoring libraries as C++ and CLR extensions that eliminate the processing overhead of R and Python run times.
59
55
60
-
For all of these scoring methods, you must use a model that was trained using one of the supported RevoScaleR or MicrosoftML algorithms.
56
+
**Real-time scoring** was the first solution for high-performance scoring. Introduced in early versions of SQL Server 2017 and later updates to SQL Server 2016, real-time scoring relies on CLR libraries that stand in for R and Python processing over Microsoft-controlled functions in RevoScaleR, MicrosoftML (R), revoscalepy, and microsoftml (Python). CLR libraries are invoked using the **sp_rxPredict** stored procedure to generates scores from any supported model type, without calling the R or Python runtime.
61
57
62
-
For an example of real-time scoring in action, see [End to End Loan ChargeOff Prediction Built Using Azure HDInsight Spark Clusters and SQL Server 2016 R Service](https://blogs.msdn.microsoft.com/rserver/2017/06/29/end-to-end-loan-chargeoff-prediction-built-using-azure-hdinsight-spark-clusters-and-sql-server-2016-r-service/)
58
+
**Native scoring** is a SQL Server 2017 feature, implemented as a native C++ library, but only for RevoScaleR and revoscalepy ,models. It is the fastest and more secure approach, but supports a smaller set of functions relative to other methodologies.
63
59
64
60
## Choose a scoring method
65
61
66
-
The following options are supported for fast batch prediction:
67
-
68
-
+**Native scoring**: T-SQL PREDICT function in SQL Server 2017 Windows, SQL Server 2017 Linux, and Azure SQL Database.
69
-
+**Real-time scoring**: Using the sp\_rxPredict stored procedure in either SQL Server 2016 or SQL Server 2017 (Windows only).
62
+
Platform requirements dictate which scoring methodologies are available.
70
63
71
-
> [!NOTE]
72
-
> Use of the PREDICT function is recommended in SQL Server 2017.
73
-
> To use sp\_rxPredict requires that you enable SQLCLR integration. Consider the security implications before you enable this option.
64
+
| Product version and platform | Methodology |
65
+
|------------------------------|-------------|
66
+
| SQL Server 2017 on Windows, SQL Server 2017 Linux, and Azure SQL Database |**Native scoring** with T-SQL PREDICT |
67
+
| SQL Server 2017 (Windows only), SQL Server 2016 R Services at SP1 or higher |**Real-time scoring** with sp\_rxPredict stored procedure |
74
68
75
-
The overall process of preparing the model and then generating scores is similar:
76
-
77
-
1. Create a model using a supported algorithm.
78
-
2. Serialize the model using a special binary format.
79
-
3. Make the model available to SQL Server. Typically this means storing the serialized model in a SQL Server table.
80
-
4. Call the function or stored procedure, and pass the model and input data.
69
+
We recommend native scoring with the PREDICT function. Using sp\_rxPredict requires that you enable SQLCLR integration. Consider the security implications before you enable this option.
81
70
82
71
## Serialization and storage
83
72
84
73
To use a model with either of the fast scoring options, save the model using a special serialized format, which has been optimized for size and scoring efficiency.
85
74
86
-
+ Call `rxSerializeModel` to write a supported model to the **raw** format.
87
-
+ Call `rxUnserializeModel` to reconstitute the model for use in other R code, or to view the model.
88
-
89
-
For more information, see [rxSerializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel).
75
+
+ Call [rxSerializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel) to write a supported model to the **raw** format.
76
+
+ Call [rxUnserializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel)` to reconstitute the model for use in other R code, or to view the model.
90
77
91
78
**Using SQL**
92
79
93
-
From SQL code, you can train the model using `sp_execute_external_script`, and directly insert the trained models into a table, in a column of type **varbinary(max)**.
94
-
95
-
For a simple example, see [this tutorial](../tutorials/rtsql-create-a-predictive-model-r.md)
80
+
From SQL code, you can train the model using [sp_execute_external_script](https://docs.microsoft.com//sql/relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql), and directly insert the trained models into a table, in a column of type **varbinary(max)**. For a simple example, see [Create a preditive model in R](../tutorials/rtsql-create-a-predictive-model-r.md)
96
81
97
82
**Using R**
98
83
99
-
From R code, there are two ways to save the model to a table:
100
-
101
-
+ Call the `rxWriteObject` function, from the RevoScaleR package, to write the model directly to the database.
102
-
103
-
The `rxWriteObject()` function can retrieve R objects from an ODBC data source like SQL Server, or write objects to SQL Server. The API is modeled after a simple key-value store.
84
+
From R code, call the [rxWriteObject](https://docs.microsoft.com/machine-learning-server/r-reference/revoscaler/rxwriteobject) function from RevoScaleR package to write the model directly to the database. The `rxWriteObject()` function can retrieve R objects from an ODBC data source like SQL Server, or write objects to SQL Server. The API is modeled after a simple key-value store.
104
85
105
-
If you use this function, be sure to serialize the model using the new serialization function first. Then, set the *serialize* argument in `rxWriteObject` to FALSE, to avoid repeating the serialization step.
106
-
107
-
+ You can also save the model in raw format to a file and then read from the file into SQL Server. This option might be useful if you are moving or copying models between environments.
86
+
If you use this function, be sure to serialize the model using [rxSerializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel) first. Then, set the *serialize* argument in `rxWriteObject` to FALSE, to avoid repeating the serialization step.
108
87
88
+
Serialing a model to a binary format is useful, but not required if you are scoring predictions using R and Python run time environment in the extensibility framework. You can save a model in raw byte format to a file and then read from the file into SQL Server. This option might be useful if you are moving or copying models between environments.
109
89
110
90
## Scoring in related Microsoft products
111
91
112
-
If you are using the standalone server or a Microsoft Machine Learning Server instead of SQL Server in-database analytics, you have other options besides stored procedures and T-SQL functions for generating predictions.
113
-
114
-
Both the standalone server and Machine Learning Server support the concept of a *web service* for code deployment. You can bundle an R or Python pre-trained model as a web service, called at run time to evaluate new data inputs. For more information, see these articles:
92
+
If you are using the [standalone server](r-server-standalone.md) or a [Microsoft Machine Learning Server](https://docs.microsoft.com/machine-learning-server/what-is-machine-learning-server), you have other options besides stored procedures and T-SQL functions for generating predictions quickly. Both the standalone server and Machine Learning Server support the concept of a *web service* for code deployment. You can bundle an R or Python pre-trained model as a web service, called at run time to evaluate new data inputs. For more information, see these articles:
115
93
116
94
+[What are web services in Machine Learning Server?](https://docs.microsoft.com/machine-learning-server/operationalize/concept-what-are-web-services)
117
95
+[What is operationalization?](https://docs.microsoft.com/machine-learning-server/operationalize/concept-operationalize-deploy-consume)
0 commit comments