Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Commit 86fb84f

Browse files
authored
Update get-started-with-columnstore-for-real-time-operational-analytics.md
1 parent 4e8be3c commit 86fb84f

1 file changed

Lines changed: 22 additions & 24 deletions

File tree

docs/relational-databases/indexes/get-started-with-columnstore-for-real-time-operational-analytics.md

Lines changed: 22 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
1515
# Get started with Columnstore for real-time operational analytics
1616
[!INCLUDE [SQL Server Azure SQL Database](../../includes/applies-to-version/sql-asdb.md)]
1717

18-
SQL Server 2016 introduces real-time operational analytics, the ability to run both analytics and OLTP workloads on the same database tables at the same time. Besides running analytics in real time, you can also eliminate the need for ETL and a data warehouse.
18+
[!INCLUDE[ssSQL15](../../includes/sssql15-md.md)] introduces real-time operational analytics, the ability to run both analytics and OLTP workloads on the same database tables at the same time. Besides running analytics in real time, you can also eliminate the need for ETL and a data warehouse.
1919

2020
## Real-Time Operational Analytics Explained
2121
Traditionally, businesses have had separate systems for operational (that is, OLTP) and analytics workloads. For such systems, Extract, Transform, and Load (ETL) jobs regularly move the data from the operational store to an analytics store. The analytics data is usually stored in a data warehouse or data mart dedicated to running analytics queries. While this solution has been the standard, it has these three key challenges:
@@ -28,13 +28,13 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
2828

2929
![real-time operational analytics overview](../../relational-databases/indexes/media/real-time-operational-analytics-overview.png "real-time operational analytics overview")
3030

31-
Real-time operational analytics offers a solution to these challenges.
32-
There is no time delay when analytics and OLTP workloads run on the same underlying table. For scenarios that can use real-time analytics, the costs and complexity are greatly reduced by eliminating the need for ETL and the need to purchase and maintain a separate data warehouse.
31+
Real-time operational analytics offers a solution to these challenges.
32+
There is no time delay when analytics and OLTP workloads run on the same underlying table. For scenarios that can use real-time analytics, the costs and complexity are greatly reduced by eliminating the need for ETL and the need to purchase and maintain a separate data warehouse.
3333

3434
> [!NOTE]
3535
> Real-time operational analytics targets the scenario of a single data source such as an enterprise resource planning (ERP) application on which you can run both the operational and the analytics workload. This does not replace the need for a separate data warehouse when you need to integrate data from multiple sources before running the analytics workload or when you require extreme analytics performance using pre-aggregated data such as cubes.
3636
37-
Real-time analytics uses an updateable columnstore index on a rowstore table. The columnstore index maintains a copy of the data, so the OLTP and analytics workloads run against separate copies of the data. This minimizes the performance impact of both workloads running at the same time. SQL Server automatically maintains index changes so OLTP changes are always up-to-date for analytics. With this design, it is possible and practical to run analytics in real time on up-to-date data. This works for both disk-based and memory-optimized tables.
37+
Real-time analytics uses an updateable columnstore index on a rowstore table. The columnstore index maintains a copy of the data, so the OLTP and analytics workloads run against separate copies of the data. This minimizes the performance impact of both workloads running at the same time. [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] automatically maintains index changes so OLTP changes are always up-to-date for analytics. With this design, it is possible and practical to run analytics in real time on up-to-date data. This works for both disk-based and memory-optimized tables.
3838

3939
## Get Started Example
4040
To get started with real-time analytics:
@@ -43,7 +43,7 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
4343

4444
2. For each table, drop all btree indexes that are primarily designed to speed up existing analytics on your OLTP workload. Replace them with a single columnstore index. This can improve the overall performance of your OLTP workload since there will be fewer indexes to maintain.
4545

46-
```
46+
```sql
4747
--This example creates a nonclustered columnstore index on an existing OLTP table.
4848
--Create the table
4949
CREATE TABLE t_account (
@@ -57,12 +57,11 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
5757
CREATE NONCLUSTERED COLUMNSTORE INDEX account_NCCI
5858
ON t_account (accountkey, accountdescription, unitsold)
5959
;
60-
6160
```
6261

63-
The columnstore index on an in-memory table allows operational analytics by integrating in-memory OLTP and in-memory columnstore technologies to deliver high performance for both OLTP and analytics workloads. The columnstore index on an in-memory table must include all the columns.
62+
The columnstore index on an in-memory table allows operational analytics by integrating in-memory OLTP and in-memory columnstore technologies to deliver high performance for both OLTP and analytics workloads. The columnstore index on an in-memory table must include all the columns.
6463

65-
```
64+
```sql
6665
-- This example creates a memory-optimized table with a columnstore index.
6766
CREATE TABLE t_account (
6867
accountkey int NOT NULL PRIMARY KEY NONCLUSTERED,
@@ -81,7 +80,7 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
8180
You are now ready to run real-time operational analytics without making any changes to your application. Analytics queries will run against the columnstore index and OLTP operations will keep running against your OLTP btree indexes. The OLTP workloads will continue to perform, but will incur some additional overhead to maintain the columnstore index. See the performance optimizations in the next section.
8281

8382
## Blog Posts
84-
Read Sunil Agarwal's blog posts to learn more about real-time operational analytics. It might be easier to understand the performance tips sections if you look at the blog posts first.
83+
Read the following blog posts to learn more about real-time operational analytics. It might be easier to understand the performance tips sections if you look at the blog posts first.
8584

8685
- [Business case for real-time operational analytics](https://blogs.technet.microsoft.com/dataplatforminsider/2015/12/09/real-time-operational-analytics-using-in-memory-technology/)
8786

@@ -102,11 +101,11 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
102101
- [Columnstore index and the merge policy for rowgroups](https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/08/columnstore-index-merge-policy-for-reorganize/)
103102

104103
## Performance tip #1: Use filtered indexes to improve query performance
105-
Running real-time operational analytics can impact the performance of the OLTP workload. This impact should be minimal. The example below shows how to use filtered indexes to minimize impact of nonclustered columnstore index on transactional workload while still delivering analytics in real time.
104+
Running real-time operational analytics can impact the performance of the OLTP workload. This impact should be minimal. The example below shows how to use filtered indexes to minimize impact of nonclustered columnstore index on transactional workload while still delivering analytics in real time.
106105

107106
To minimize the overhead of maintaining a nonclustered columnstore index on an operational workload, you can use a filtered condition to create a nonclustered columnstore index only on the *warm* or slowly changing data. For example, in an order management application, you can create a nonclustered columnstore index on the orders that have already been shipped. Once the order has shipped, it rarely changes and therefore can be considered warm data. With Filtered index, the data in nonclustered columnstore index requires fewer updates thereby lowering the impact on transactional workload.
108107

109-
Analytics queries transparently access both warm and hot data as needed to provide real-time analytics. If a significant part of the operational workload is touching the 'hot' data, those operations will not require additional maintenance of the columnstore index. A best practice is to have a rowstore clustered index on the column(s) used in the filtered index definition. SQL Server uses the clustered index to quickly scan the rows that did not meet the filtered condition. Without this clustered index, a full table scan of the rowstore table will be required to find these rows, which can negatively impact the performance of analytics query significantly. In the absence of clustered index, you could create a complementary filtered nonclustered btree index to identify such rows but it is not recommended because accessing large range of rows through nonclustered btree indexes is expensive.
108+
Analytics queries transparently access both warm and hot data as needed to provide real-time analytics. If a significant part of the operational workload is touching the 'hot' data, those operations will not require additional maintenance of the columnstore index. A best practice is to have a rowstore clustered index on the column(s) used in the filtered index definition. [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] uses the clustered index to quickly scan the rows that did not meet the filtered condition. Without this clustered index, a full table scan of the rowstore table will be required to find these rows, which can negatively impact the performance of analytics query significantly. In the absence of clustered index, you could create a complementary filtered nonclustered btree index to identify such rows but it is not recommended because accessing large range of rows through nonclustered btree indexes is expensive.
110109

111110
> [!NOTE]
112111
> A filtered nonclustered columnstore index is only supported on disk-based tables. It is not supported on memory-optimized tables
@@ -117,9 +116,9 @@ monikerRange: "=azuresqldb-current||>=sql-server-2016||=sqlallproducts-allversio
117116
![Combined indexes for warm and hot data](../../relational-databases/indexes/media/de-columnstore-warmhotdata.png "Combined indexes for warm and hot data")
118117

119118
> [!NOTE]
120-
> The query optimizer will consider, but not always choose, the columnstore index for the query plan. When the query optimizer chooses the filtered columnstore index, it transparently combines the rows both from columnstore index as well as the rows that do not meet the filtered condition to allow real-time analytics. This is different from a regular nonclustered filtered index which can be used only in queries that restrict themselves to the rows present in the index.
119+
> The Query Optimizer will consider, but not always choose, the columnstore index for the query plan. When the query optimizer chooses the filtered columnstore index, it transparently combines the rows both from columnstore index as well as the rows that do not meet the filtered condition to allow real-time analytics. This is different from a regular nonclustered filtered index which can be used only in queries that restrict themselves to the rows present in the index.
121120

122-
```
121+
```sql
123122
--Use a filtered condition to separate hot data in a rowstore table
124123
-- from "warm" data in a columnstore index.
125124
@@ -162,18 +161,17 @@ Group By customername
162161
Please refer to the blog for details on [filtered nonclustered columnstore index.](https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/06/real-time-operational-analytics-filtered-nonclustered-columnstore-index-ncci/)
163162

164163
## Performance tip #2: Offload analytics to Always On readable secondary
165-
Even though you can minimize the columnstore index maintenance by using a filtered columnstore index, the analytics queries can still require significant computing resources (CPU, IO, memory) which impact the operational workload performance. For most mission critical workloads, our recommendation is to use the Always On configuration. In this configuration, you can eliminate the impact of running analytics by offloading it to a readable secondary.
164+
Even though you can minimize the columnstore index maintenance by using a filtered columnstore index, the analytics queries can still require significant computing resources (CPU, I/O, memory) which impact the operational workload performance. For most mission critical workloads, our recommendation is to use the Always On configuration. In this configuration, you can eliminate the impact of running analytics by offloading it to a readable secondary.
166165

167166
## Performance Tip #3: Reducing Index fragmentation by keeping hot data in delta rowgroups
168-
Tables with columnstore index may get significantly fragmented (that is, deleted rows) if the workload updates/deletes rows that have been compressed. A fragmented columnstore index leads to inefficient utilization of memory/storage. Besides inefficient use of resources, it also negatively impacts the analytics query performance because of extra IO and the need to filter the deleted rows from the result set.
167+
Tables with columnstore index may get significantly fragmented (that is, deleted rows) if the workload updates/deletes rows that have been compressed. A fragmented columnstore index leads to inefficient utilization of memory/storage. Besides inefficient use of resources, it also negatively impacts the analytics query performance because of extra I/O and the need to filter the deleted rows from the result set.
169168

170-
The deleted rows are not physically removed until you run index defragmentation with REORGANIZE command or rebuild the columnstore index on the entire table or the affected partition(s). Both REORGANIZE and Index REBUILD are expensive operations taking resources away which otherwise could be used for the workload. Additionally, if rows compressed too early, it may need to be re-compressed multiple times due to updates leading to wasted compression overhead.
171-
You can minimize index fragmentation using COMPRESSION_DELAY option.
172-
173-
```
169+
The deleted rows are not physically removed until you run index defragmentation with `REORGANIZE` command or rebuild the columnstore index on the entire table or the affected partition(s). Both index `REORGANIZE` and `REBUILD` are expensive operations taking resources away which otherwise could be used for the workload. Additionally, if rows compressed too early, it may need to be re-compressed multiple times due to updates leading to wasted compression overhead.
170+
You can minimize index fragmentation using `COMPRESSION_DELAY` option.
174171

172+
```sql
175173
-- Create a sample table
176-
create table t_colstor (
174+
CREATE TABLE t_colstor (
177175
accountkey int not null,
178176
accountdescription nvarchar (50) not null,
179177
accounttype nvarchar(50),
@@ -187,9 +185,9 @@ CREATE NONCLUSTERED COLUMNSTORE index t_colstor_cci on t_colstor (accountkey, ac
187185
;
188186
```
189187

190-
Please refer to the blog for details on [compression delay.](https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/06/real-time-operational-analytics-compression-delay-option-for-nonclustered-columnstore-index-ncci/)
188+
Please refer to the blog for details on [compression delay](https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/06/real-time-operational-analytics-compression-delay-option-for-nonclustered-columnstore-index-ncci/).
191189

192-
Here are the recommended best practices
190+
Here are the recommended best practices:
193191

194192
- **Insert/Query workload:**If your workload is primarily inserting data and querying it, the default COMPRESSION_DELAY of 0 is the recommended option. The newly inserted rows will get compressed once 1 million rows have been inserted into a single delta rowgroup.
195193
Some examples of such workload are (a) traditional DW workload (b) click-stream analysis when you need to analyze the click pattern in a web application.
@@ -199,14 +197,14 @@ CREATE NONCLUSTERED COLUMNSTORE index t_colstor_cci on t_colstor (accountkey, ac
199197
We expect most customers do not need to do anything. The default value of COMPRESSION_DELAY option should work for them.
200198
For advance users, we recommend running the query below and collect % of deleted rows over the last 7 days.
201199

202-
```
200+
```sql
203201
SELECT row_group_id,cast(deleted_rows as float)/cast(total_rows as float)*100 as [% fragmented], created_time
204202
FROM sys. dm_db_column_store_row_group_physical_stats
205203
WHERE object_id = object_id('FactOnlineSales2')
206204
AND state_desc='COMPRESSED'
207205
AND deleted_rows>0
208206
AND created_time > GETDATE() - 7
209-
ORDER BY created_time DESC
207+
ORDER BY created_time DESC;
210208
```
211209

212210
If the number of deleted rows in compressed rowgroups > 20%, plateauing in older rowgroups with < 5% variation (referred to as cold rowgroups) set COMPRESSION_DELAY = (youngest_rowgroup_created_time - current_time). Note that this approach works best with a stable and relatively homogeneous workload.

0 commit comments

Comments
 (0)