Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Commit 632e198

Browse files
committed
cleanup
1 parent 3a79952 commit 632e198

3 files changed

Lines changed: 89 additions & 101 deletions

File tree

docs/advanced-analytics/r/how-to-do-realtime-scoring.md

Lines changed: 27 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -32,86 +32,64 @@ The significance of CLR and C++ extensions is proximity to the database engine i
3232

3333
As you might expect, platform support is impacted by these run time environments. Native database engine extensions run anywhere the relational database is supported: Windows, Linux, Azure. CLR extensions with the .NET Core requirement is currently Windows only.
3434

35-
## Native and real-time scoring compared
36-
37-
Starting in SQL Server 2016, Microsoft added an extensibility framework that allows R scripts to be executed from T-SQL. This framework supports any operation you might perform in R, ranging from simple functions to training complex machine learning models. However, the dual-process architecture requires invoking an external R process for every call, regardless of the complexity of the operation. If you are loading a pre-trained model from a table and scoring against it on data already in SQL Server, the overhead of calling the external R process represents an unnecessary performance cost.
38-
39-
_Scoring_ is a two-step process. First, you specify an already trained model to load from a table. Second, pass new input data to the function, to generate prediction values (or _scores_). The input can be either tabular or single rows. You can choose to output a single column value representing a probability, or you might output several values, such as a confidence interval, error, or other useful complement to the prediction.
35+
## Scoring overview
4036

41-
When the input includes many rows of data, it is usually faster to insert the prediction values into a table as part of the scoring process. Generating a single score is more typical in a scenario where you get input values from a form or user request, and return the score to a client application. To improve performance when generating successive scores, SQL Server might cache the model so that it can be reloaded into memory.
42-
43-
To support fast scoring, SQL Server Machine Learning Services (and Microsoft Machine Learning Server) provide built-in scoring libraries that work in R or in T-SQL. There are different options depending on which version you have.
37+
_Scoring_ is a two-step process. First, you specify an already trained model to load from a table. Second, pass new input data to the function, to generate prediction values (or _scores_). The input is often a T-SQL query, returning either tabular or single rows. You can choose to output a single column value representing a probability, or you might output several values, such as a confidence interval, error, or other useful complement to the prediction.
4438

45-
**Native scoring**
39+
Taking a step back, the overall process of preparing the model and then generating scores can be summarized this way:
4640

47-
+ The PREDICT function in Transact-SQL supports _native scoring_ in any instance of SQL Server 2017. It requires only that you have a model already trained, which you can call using T-SQL. Native scoring using T-SQL has these advantages:
41+
1. Create a model using a supported algorithm.
42+
2. Serialize the model using a special binary format.
43+
3. Make the model available to SQL Server. Typically this means storing the serialized model in a SQL Server table.
44+
4. Call the function or stored procedure, specifying the model and input data as parameters.
4845

49-
+ No additional configuration is required.
50-
+ The R runtime is not called. There is no need to install R.
46+
When the input includes many rows of data, it is usually faster to insert the prediction values into a table as part of the scoring process. Generating a single score is more typical in a scenario where you get input values from a form or user request, and return the score to a client application. To improve performance when generating successive scores, SQL Server might cache the model so that it can be reloaded into memory.
5147

52-
**Real-time scoring**
48+
## Native and real-time scoring compared
5349

54-
+ **sp_rxPredict** is a stored procedure for real-time scoring that can be used to generates scores from any supported model type, without calling the R runtime.
50+
To preserve the integrity of core database engine processes, support for R and Python is enabled in a dual architecture that isolates language processing from RDBMS processing. Starting in SQL Server 2016, Microsoft added an extensibility framework that allows R scripts to be executed from T-SQL. In SQL Server 2017, Python integration was added.
5551

56-
This stored procedure is also available in SQL Server 2016, if you upgrade the R components using the standalone installer of Microsoft R Server. sp_rxPredict is also supported in SQL Server 2017. Therefore, you might use this function when generating scores with a model type not supported by the PREDICT function.
52+
The extensibility framework supports any operation you might perform in R or Python, ranging from simple functions to training complex machine learning models. However, the dual-process architecture requires invoking an external R or Python process for every call, regardless of the complexity of the operation. When the workload entails loading a pre-trained model from a table and scoring against it on data already in SQL Server, the overhead of calling the external processes adds latency that can be unacceptable in certain circumstances. For example, in a request-response pattern such as fraud detection, scores must be generated very quickly in order to be relevant.
5753

58-
+ The rxPredict function can be used for fast scoring within R code.
54+
To support fast scoring, SQL Server added built-in scoring libraries as C++ and CLR extensions that eliminate the processing overhead of R and Python run times.
5955

60-
For all of these scoring methods, you must use a model that was trained using one of the supported RevoScaleR or MicrosoftML algorithms.
56+
**Real-time scoring** was the first solution for high-performance scoring. Introduced in early versions of SQL Server 2017 and later updates to SQL Server 2016, real-time scoring relies on CLR libraries that stand in for R and Python processing over Microsoft-controlled functions in RevoScaleR, MicrosoftML (R), revoscalepy, and microsoftml (Python). CLR libraries are invoked using the **sp_rxPredict** stored procedure to generates scores from any supported model type, without calling the R or Python runtime.
6157

62-
For an example of real-time scoring in action, see [End to End Loan ChargeOff Prediction Built Using Azure HDInsight Spark Clusters and SQL Server 2016 R Service](https://blogs.msdn.microsoft.com/rserver/2017/06/29/end-to-end-loan-chargeoff-prediction-built-using-azure-hdinsight-spark-clusters-and-sql-server-2016-r-service/)
58+
**Native scoring** is a SQL Server 2017 feature, implemented as a native C++ library, but only for RevoScaleR and revoscalepy ,models. It is the fastest and more secure approach, but supports a smaller set of functions relative to other methodologies.
6359

6460
## Choose a scoring method
6561

66-
The following options are supported for fast batch prediction:
67-
68-
+ **Native scoring**: T-SQL PREDICT function in SQL Server 2017 Windows, SQL Server 2017 Linux, and Azure SQL Database.
69-
+ **Real-time scoring**: Using the sp\_rxPredict stored procedure in either SQL Server 2016 or SQL Server 2017 (Windows only).
62+
Platform requirements dictate which scoring methodologies are available.
7063

71-
> [!NOTE]
72-
> Use of the PREDICT function is recommended in SQL Server 2017.
73-
> To use sp\_rxPredict requires that you enable SQLCLR integration. Consider the security implications before you enable this option.
64+
| Product version and platform | Methodology |
65+
|------------------------------|-------------|
66+
| SQL Server 2017 on Windows, SQL Server 2017 Linux, and Azure SQL Database | **Native scoring** with T-SQL PREDICT |
67+
| SQL Server 2017 (Windows only), SQL Server 2016 R Services at SP1 or higher | **Real-time scoring** with sp\_rxPredict stored procedure |
7468

75-
The overall process of preparing the model and then generating scores is similar:
76-
77-
1. Create a model using a supported algorithm.
78-
2. Serialize the model using a special binary format.
79-
3. Make the model available to SQL Server. Typically this means storing the serialized model in a SQL Server table.
80-
4. Call the function or stored procedure, and pass the model and input data.
69+
We recommend native scoring with the PREDICT function. Using sp\_rxPredict requires that you enable SQLCLR integration. Consider the security implications before you enable this option.
8170

8271
## Serialization and storage
8372

8473
To use a model with either of the fast scoring options, save the model using a special serialized format, which has been optimized for size and scoring efficiency.
8574

86-
+ Call `rxSerializeModel` to write a supported model to the **raw** format.
87-
+ Call `rxUnserializeModel` to reconstitute the model for use in other R code, or to view the model.
88-
89-
For more information, see [rxSerializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel).
75+
+ Call [rxSerializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel) to write a supported model to the **raw** format.
76+
+ Call [rxUnserializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel)` to reconstitute the model for use in other R code, or to view the model.
9077

9178
**Using SQL**
9279

93-
From SQL code, you can train the model using `sp_execute_external_script`, and directly insert the trained models into a table, in a column of type **varbinary(max)**.
94-
95-
For a simple example, see [this tutorial](../tutorials/rtsql-create-a-predictive-model-r.md)
80+
From SQL code, you can train the model using [sp_execute_external_script](https://docs.microsoft.com//sql/relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql), and directly insert the trained models into a table, in a column of type **varbinary(max)**. For a simple example, see [Create a preditive model in R](../tutorials/rtsql-create-a-predictive-model-r.md)
9681

9782
**Using R**
9883

99-
From R code, there are two ways to save the model to a table:
100-
101-
+ Call the `rxWriteObject` function, from the RevoScaleR package, to write the model directly to the database.
102-
103-
The `rxWriteObject()` function can retrieve R objects from an ODBC data source like SQL Server, or write objects to SQL Server. The API is modeled after a simple key-value store.
84+
From R code, call the [rxWriteObject](https://docs.microsoft.com/machine-learning-server/r-reference/revoscaler/rxwriteobject) function from RevoScaleR package to write the model directly to the database. The `rxWriteObject()` function can retrieve R objects from an ODBC data source like SQL Server, or write objects to SQL Server. The API is modeled after a simple key-value store.
10485

105-
If you use this function, be sure to serialize the model using the new serialization function first. Then, set the *serialize* argument in `rxWriteObject` to FALSE, to avoid repeating the serialization step.
106-
107-
+ You can also save the model in raw format to a file and then read from the file into SQL Server. This option might be useful if you are moving or copying models between environments.
86+
If you use this function, be sure to serialize the model using [rxSerializeModel](https://docs.microsoft.com/r-server/r-reference/revoscaler/rxserializemodel) first. Then, set the *serialize* argument in `rxWriteObject` to FALSE, to avoid repeating the serialization step.
10887

88+
Serialing a model to a binary format is useful, but not required if you are scoring predictions using R and Python run time environment in the extensibility framework. You can save a model in raw byte format to a file and then read from the file into SQL Server. This option might be useful if you are moving or copying models between environments.
10989

11090
## Scoring in related Microsoft products
11191

112-
If you are using the standalone server or a Microsoft Machine Learning Server instead of SQL Server in-database analytics, you have other options besides stored procedures and T-SQL functions for generating predictions.
113-
114-
Both the standalone server and Machine Learning Server support the concept of a *web service* for code deployment. You can bundle an R or Python pre-trained model as a web service, called at run time to evaluate new data inputs. For more information, see these articles:
92+
If you are using the [standalone server](r-server-standalone.md) or a [Microsoft Machine Learning Server](https://docs.microsoft.com/machine-learning-server/what-is-machine-learning-server), you have other options besides stored procedures and T-SQL functions for generating predictions quickly. Both the standalone server and Machine Learning Server support the concept of a *web service* for code deployment. You can bundle an R or Python pre-trained model as a web service, called at run time to evaluate new data inputs. For more information, see these articles:
11593

11694
+ [What are web services in Machine Learning Server?](https://docs.microsoft.com/machine-learning-server/operationalize/concept-what-are-web-services)
11795
+ [What is operationalization?](https://docs.microsoft.com/machine-learning-server/operationalize/concept-operationalize-deploy-consume)

0 commit comments

Comments
 (0)