MCITP

MCITP

Tuesday, September 23, 2014

Execution plan explaination

Execution plan :
-----------------------------
There are two type of execution plan
1. Estimated execution plan :
a. Created without even running the query.
b. Uses statistics of table and indexes to find the detail.
c. Good for long running query tuning.

2. Actual execution plan :
a. It creates execution plan when actual query runs.
b. It displays result after query completes the execution so wait for large data query.
c. It gives exact plan of query with calculated on actual data and indexes.
d. If statistics are out of date result of estimated and actual could be different.

assuming you're not sysadmin, dbcreator or db_owner, you'll need to be granted the ShowPlan permission within the database being tested.
GRANT SHOWPLAN TO [username]

Execution plan uses physical joins for any logical joins (inner, left, right and outer) used by user in a query.
These physical joins are: HASH Join, MERGE Join and NESTED join.

Execution plans internally uses above joins according to the size of tables, indexes used in joined tables and Sorted order on the join column. The query optimizer will be smart and always try to pick up the most optimal physical joins. As we know SQL optimizer creates a plan cost based and depends upon the query cost it will choose the best join.

HASH JOIN: Uses each row from top input to build a hash table, and the each row from below input to probe into the hash table and outputting all matching rows. It creates hash table in available memory or temp database.

MERGE JOIN: Match rows from two suitably sorted input tables exploiting the sort order. Merge join only works with unique clustered index.

NESTED JOIN: For each row in top(outer) input, scan the bottom(inner) input, and output the matching row.

For below conditions with unique clustered index:
Table1 (BIG) and Table2 (BIG)
1. If the both or eitherof table has no index optimizer will use hash join internally.
2. If both the big table has unique clustered index optimizer will use merge join internally.
Table1 (Medium) and Table2 (Medium)
1. If the both the table has no index optimizer will use hash join internally.
2. If the both or eitherof table has unique index optimizer will use merge join internally.
Table1 (small) and Table2 (small)
1. If the both the table has no index optimizer will use hash join internally.
2. If the both or eitherof table has unique index optimizer will use nested join internally.
Table1 (Big) and Table2 (small/medium)
1. If the both the table has no index optimizer will use hash join internally.
2. If the both or eitherof table has unique index optimizer will use nested join internally.
Please note if clustered index is used instead of unique clustered index all joins will use only hash except nested join for small tables.if the index created without unique keyword then SQL will not know its unique (data) so it will create GUID of 4-byte integer for joins.
If no other match suits to optimizer it uses hash match using hash to retrieve output.
Execution plan also displays the type of scan or seek used by table.
They are : Table scan , Index scan and index seek.

Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table.

Table scan means iterate over all table rows.

Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index.

table scan/index scan/index seek possibilities:--

A table with no index will do table scan.·
A table with only clustered index will do index scan when no search criteria used.·
A table with only clustered index will do index seek when clustered index column used in search criteria.·
A table with only clustered index will do index scan when nonindexed column used in search criteria.· A table with only non clustered index will do table scan when non indexed column used in search criteria.·
A table with only non clustered index will do index scan/index seek depending on volume of table when indexed column used in search criteria.·
A table with clustered and non clustered index will do index scan or seek depending on the search column and volume of table in where clause.

Execution plan cache :
It is expensive for the Server to generate execution plans for each query so SQL Server will keep and reuse plans wherever possible. As they are created, plans are stored in a section of memory called the plan cache) and can be checked using DMV sys.dm_exec_cached_plansEach unique query gets an Execution Plan
–Performed by the Query Optimizer
–Identical queries gets the same Execution Plan
–Problematic when used with dynamic T-SQL
Creation takes some time
Stored in the Plan Cache
When a query is submitted to the server, an estimated execution plan is created by the optimizer. Once that plan is created, and before it gets passed to the storage engine, the optimizer compares this estimated plan to actual execution plans that already exist in the plan cache . If an actual plan is found that matches the estimated one, then the optimizer will reuse the existing plan, since it's already been used before by the query engine. This reuse avoids the overhead of creating actual execution plans for large and complex queries or even simple plans for small queries called thousands of times in a minute.
Each plan is stored once, unless the cost of the plan lets the optimizer know that a parallel execution might result in better performance. If the optimizer sees parallelism as an option, then a second plan is created and stored with a different set of operations to support parallelism. In this instance, one query gets two plans.
Execution plans are not kept in memory forever. A plan with a cost of 10 that has been referenced 5 times has an "age" value of 50. The lazy writer process, an internal process that works to free all types of cache (including plan cache ), periodically scans the objects in the cache and decreases this value by one each time.
If the following criteria are met, the plan is removed from memory:
More memory is required by the system
The "age" of the plan has reached zero
The plan isn't currently being referenced by an existing connection
The plan also modifies or recompiles (may be expensive operation) in following event:
· Change in schema or structure of table used in query.
· Changing a index or updating statistics used by query.
· Using sp_recompile function.
To clear the cache use below command:
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE
To see the object referenced and query used by optimizaor plan we can use below query.

SELECT [cp].[refcounts]
, [cp].[usecounts]
, [cp].[objtype]
, [st].[dbid]
, [st].[objectid]
, [st].[text]
, [qp].[query_plan]
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text ( cp.plan_handle ) st
CROSS APPLY sys.dm_exec_query_plan ( cp.plan_handle ) qp ;

Above columns in query defines :
· Refcounts : Number of objects referecing this plan
· Usecounts : Usage count
· Objtype : Proc, Ad hoc, View
· Plan_handle : Identifier for this plan in memory (You can use the XML directly or open it as a graphical execution plan.)
Different ways of getting execution plan:
To get the graphical execution plan we can use SSMS icon for estimated and actual execution plan.
To get text execution plan :
To activate the text version of the estimated text execution plan, simply issue the following command at the start of the query:
SET SHOWPLAN_ALL ON/OFF;
In order to activate and deactivate the text version of the Actual execution plan, use:
SET STATISTICS PROFILE ON/OFF;
To get the XML format execution plan:
In order to activate and deactivate the XML version of the Estimated execution plan, use:
SET SHOWPLAN_XML ON/OFF;
For the XML version of the Actual plan, use:
SET STATISTICS XML ON/OFF;
Automating Plan Capture Using SQL Server Profiler

A production system may be subject to tens or hundreds of sessions executing tens or hundreds or queries, each with varying parameter sets and varying plans. In this situation we need a way to automate plan capture so that we can collect a large number of plans simultaneously. In SQL Server 2005 you can use Profiler to capture XML execution plans, as the queries are executing. You can then examine the collected plans, looking for the queries with the highest costs, or simply searching the plans to find, for example, Table Scan operations that you'd like to eliminate.
Showplan All, Showplan Statistics Profile, Showplan XML, Showplan XML Statistics Profile are few events of profilers for execution plan.



Capturing all of the execution plans, using Showplan XML or Showplan XML Statistics Profile, inherently places a sizeable load on the server. These are not lightweight event capture scenarios. Even the use of the less frequent Showplan XML for Query Compile will cause a small performance hit. Use due diligence when running traces of this type against any production machine.

Working with Index and keys


INDEX SCAN VS SEEK:
--------------------------------

Index Scan: Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate.

Index Seek: Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

A table with no index is forced to do table scan. Note that for some queries it will use a table scan even when a useful index is present - this is usually because the data in the table is so small that it is more hassle to traverse the indexes .

Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table.

Table scan means iterate over all table rows.

Index scan means iterate over all index items, when item index meets search condition, table row is retrieved through index.

Usually index scan is less expensive than a table scan because index is more flat than a table.
Clustered Indexes: With a clustered index the table data is stored in the leaf nodes of that index instead of being in a separate heap structure. This means that there will never need to be any extra row lookups after finding rows using that index no matter what columns are needed [unless you have off-page data like TEXT columns or VARCHAR(MAX) columns containing long data]. You can only have one clustered index for this reason, so if you use one chose where you put it carefully in
order to get maximum gain.
table scan/index scan/index seek possibilities
-----------------------------------------------------------
· A table with no index will do table scan.
· A table with only clustered index will do index scan when no search criteria used.
· A table with only clustered index will do index seek when clustered index column used in search criteria.
· A table with only clustered index will do index scan when nonindexed column used in search criteria.
· A table with only nonclustered index will do table scan when nonindexed column used in search criteria.
· A table with only nonclusterd index will do index scan/index seek depending on volume of table when indexed column used in search criteria.
· A table with clustered and non clustered index will do index scan or seek depending on the search column and volume of table in where clause.
INDEX TYPES
================

CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
[ WHERE ]
[ WITH ( [ ,...n ] ) ]
[ ON { partition_scheme_name ( column_name )
| filegroup_name
| default
}
]
[ FILESTREAM_ON { filestream_filegroup_name | partition_scheme_name | "NULL" } ]

[ ; ]

::=
{
PAD_INDEX = { ON | OFF }
| FILLFACTOR = fillfactor
| SORT_IN_TEMPDB = { ON | OFF }
| IGNORE_DUP_KEY = { ON | OFF }
| STATISTICS_NORECOMPUTE = { ON | OFF }
| DROP_EXISTING = { ON | OFF }
| ONLINE = { ON | OFF }
| ALLOW_ROW_LOCKS = { ON | OFF }
| ALLOW_PAGE_LOCKS = { ON | OFF }
| MAXDOP = max_degree_of_parallelism
| DATA_COMPRESSION = { NONE | ROW | PAGE}
[ ON PARTITIONS ( { | }
[ , ...n ] ) ]
}

By default a nonclustered index is created if not specified.

UNIQUE :
-----------
Creates a unique index on a table or view. A unique index is one in which no two rows are permitted to have the same index key value. A clustered index on a view must be unique.

The Database Engine does not allow creating a unique index on columns that already include duplicate values, whether or not IGNORE_DUP_KEY is set to ON. If this is tried, the Database Engine displays an error message. Duplicate values must be removed before a unique index can be created on the column or columns. Columns that are used in a unique index should be set to NOT NULL, because multiple null values are considered duplicates when a unique index is created.

CLUSTERED :
-------------
Creates an index in which the logical order of the key values determines the physical order of the corresponding rows in a table. The bottom, or leaf, level of the clustered index contains the actual data rows of the table. A table or view is allowed one clustered index at a time.

A view with a unique clustered index is called an indexed view. Creating a unique clustered index on a view physically materializes the view. A unique clustered index must be created on a view before any other indexes can be defined on the same view.

Create the clustered index before creating any nonclustered indexes. Existing nonclustered indexes on tables are rebuilt when a clustered index is created.

NONCLUSTERED :
------------------
Creates an index that specifies the logical ordering of a table. With a nonclustered index, the physical order of the data rows is independent of their indexed order.
The bottom,or leaf, level of the Nonclustered index contains the pointer address of actual data rows of the table

WHERE :
-----------------------------
Creates a filtered index by specifying which rows to include in the index. The filtered index must be a nonclustered index on a table. Creates filtered statistics for the data rows in the filtered index.


PAD_INDEX = { ON | OFF } :
------------------------
ON : The percentage of free space that is specified by fillfactor is applied to the intermediate-level pages of the index.
OFF or fillfactor is not specified : The intermediate-level pages are filled to near capacity, leaving sufficient space for at least one row of the maximum size the index can have.

FILLFACTOR =fillfactor :
------------------------
Specifies a percentage that indicates how full the Database Engine should make the leaf level of each index page during index creation or rebuild. fillfactor must be an integer value from 1 to 100. If fillfactor is 100, the Database Engine creates indexes with leaf pages filled to capacity.

DROP_EXISTING = { ON | OFF } :
------------------------------
Specifies that the named, preexisting clustered, or nonclustered is dropped and rebuilt. The default is OFF.

ONLINE = { ON | OFF } :
-------------------------

Specifies whether underlying tables and associated indexes are available for queries and data modification during the index operation. The default is OFF.


KEYS:
=====


Super Key :
------------
Super key is a set of one or more than one keys that can be used to identify a record uniquely in a table.Example : Primary key, Unique key, Alternate key are subset of Super Keys.

Candidate Key :
---------------
A Candidate Key is a set of one or more fields/columns that can identify a record uniquely in a table. There can be multiple Candidate Keys in one table. Each Candidate Key can work as Primary Key.

Primary Key :
---------------
Primary key is a set of one or more fields/columns of a table that uniquely identify a record in database table. It can not accept null, duplicate values. Only one Candidate Key can be Primary Key.

Alternate key :
----------------
A Alternate key is a key that can be work as a primary key. Basically it is a candidate key that currently is not primary key.

Composite/Compound Key :
-----------------------
Composite Key is a combination of more than one fields/columns of a table. It can be a Candidate key, Primary key.

Unique Key :
---------------
Uniquekey is a set of one or more fields/columns of a table that uniquely identify a record in database table. It is like Primary key but it can accept only one null value and it can not have duplicate values. For more help refer the article

Foreign Key :
-------------
Foreign Key is a field in database table that is Primary key in another table. It can accept multiple null, duplicate values. For more help refer the article Difference between primary key and foreign key.


Natural key :
--------------
A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. “real data” means data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record. Here are some examples of natural keys values: Social Security Number, ISBN, and TaxId.

surrogate key :
----------------
A surrogate key like a natural key is a column that uniquely identifies a single record in a table. But this is where the similarity stops. Surrogate keys are similar to surrogate mothers. They are keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table. It is sometimes also referred to as a dumb key, because there is no meaning associated with the value. Surrogate keys are commonly a numeric number.


Identity Key :
---------------
The IDENTITY columns are auto incrementing columns provided by SQL Server.
There can only be one IDENTITY column per table.

SQL Server will take care of incrementing this column automatically.  

Statistics good for performance

You can add statistics on columns that don’t have statistics in order to boost query performance and also in order to create more optimal execution plans.

Statistics can be created different ways- Statistics are automatically created for each index key you create- If the database setting autocreate stats is on, then SQL Server will automatically create statistics for non-indexed columns that are used in queries  

To get Statistics  detail :
DBCC SHOW_STATISTICS('tablename','indexname') WITH HISTOGRAM

- Auto Update Statistics basically means, if there is an incoming query but statistics are stale, SQL Server will update statistics first before it generates an execution plan.
- Auto Update Statistics Asynchronously on the other hand means, if there is an incoming query but statistics are stale, SQL Server uses the stale statistics to generate the execution plan, then updates the statistics afterwards.  

How do we know statistics are being used?

One good check you can do is when you generate execution plans for your queries: check out your “Actual Number of Rows” and “Estimated Number of Rows”.  If these numbers are (consistently) fairly close, then most likely your statistics are up-to-date and used by the optimizer for the query. If
not, time for you to re-check your statistics create/update frequency.

---There may be cases when you may want to disable statistics update temporarily while you’re doing massive updates on a table, and you don’t want it to be slowed down by the autoupdate.

--- to manually update statistics, you can use either
-- sp_updatestats or  UPDATE STATISTICS  

One way to get better statistics is to manually update the statistics for the table using the

UPDATE STATISTICs commandwith the FULLSCAN option or SAMPLE option.
e.g :
USE AdventureWorks;
GO
CREATE STATISTICS Products    ON Production.Product ([Name], ProductNumber)  
WITH SAMPLE 50 PERCENT

UPDATE STATISTICS Production.Product(Products)     WITH SAMPLE 50 PERCENT;

USE AdventureWorks;
GO
UPDATE STATISTICS Production.Product(Products)    WITH FULLSCAN, NORECOMPUTE;
GO