Tuesday, October 8, 2013

SQL Server Architecture



SQL Server architecture is mainly divided into different components.

1. SNI Protocol Layer.
2. Relational Engine.
3. Storage Engine
4. Buffer Pool. 

Majorly classified as two main engines: Relational Engine and the Storage engine.


SNI (SQL Server Network Interface)

The SQL Server Network Interface (SNI) is a protocol layer that establishes the network connection between the client and the server. It consists of a set of APIs that are used by both the database engine and the SQL Server Native Client (SNAC).

SNI isn’t configurable directly; you just need to configure a network protocol on the client and the server. SQL Server has support for the following protocols:

Shared memory: Simple and fast, shared memory is the default protocol used to connect from a client running on the same computer as SQL Server. It can only be used locally, has no configurable properties, and is always tried first when connecting from the local machine.

TCP/IP: TCP/IP is the most commonly used access protocol for SQL Server. It enables you to connect to SQL Server by specifying an IP address and a port number. Typically, this happens automatically when you specify an instance to connect to. Your internal name resolution system resolves the hostname part of the instance name to an IP address, and either you connect to the default TCP port number 1433 for default instances or the SQL Browser service will find the right port for a named instance using UDP port 1434.

Named Pipes: TCP/IP and Named Pipes are comparable protocols in the architectures in which they can be used. Named Pipes was developed for local area networks (LANs) but it can be inefficient across slower networks such as wide area networks (WANs). Named Pipes uses TCP port 445, so ensure that the port is open on any firewalls between the two computers, including the Windows Firewall.

VIA: Virtual Interface Adapter is a protocol that enables high-performance communications between two systems. It requires specialized hardware at both ends and a dedicated connection.
Like Named Pipes, to use the VIA protocol you first need to enable it in SQL Server Configuration  manager and then create a SQL Server alias that connects to the server using VIA as the protocol.

TDS (Tabular Data Stream) Endpoints
TDS is a Microsoft-proprietary protocol originally designed by Sybase that is used to interact with a database server. Once a connection has been made using a network protocol such as TCP/IP, a link is established to the relevant TDS endpoint that then acts as the communication point between the client and the server.
There is one TDS endpoint for each network protocol and an additional one reserved for use by the dedicated administrator connection (DAC). Once connectivity is established, TDS messages are used to communicate between the client and the server.
The SELECT statement is sent to the SQL Server as a TDS message across a TCP/IP connection
(TCP/IP is the default protocol).

Protocol Layer
When the protocol layer in SQL Server receives your TDS packet, it has to reverse the work of the SNI at the client and unwrap the packet to find out what request it contains. The protocol layer is also responsible for packaging up results and status messages to send back to the client as TDS messages.
Our SELECT statement is marked in the TDS packet as a message of type “SQL Command,” so it’s passed on to the next component, the Query Parser, to begin the path toward execution.

At the client, the statement was wrapped in aTDS packet by the SQL Server Network Interface and sent to the protocol layer on the SQL Serverwhere it was unwrapped, identified as a SQL Command, and the code sent to the Command Parserby the SNI.

The Relational Engine:

The Relational Engine is also sometimes called the query processor because its primary function is query optimization and execution. It contains a Command Parser to check query syntax and prepare query trees, a Query Optimizer that is arguably the crown jewel of any database system, and a Query Executor responsible for execution.

1. Command Parser:

The Command Parser’s role is to handle T-SQL language events. It first checks the syntax and returns any errors back to the protocol layer to send to the client. If the syntax is valid, then the next step is to generate a query plan or find an existing plan. A Query plan contains the details about how SQL Server is going to execute a piece of code. It is commonly referred to as an execution plan.

Plan Cache: Creating execution plans can be time consuming and resource intensive, so The Plan Cache, part of SQL Server’s buffer pool, is used to store execution plans in case they are needed later. LRU algorithm is used for all the plan caches in the Buffer Pool, all oldest plans will be flushed out if they cross LRU timeframe.

2. Query Optimizer:

    The Query Optimizer is one of the most complex and secretive parts of the product. It is what’s known as a “cost-based” optimizer, which means that it evaluates multiple ways to execute a query and then picks the method that it deems will have the lowest cost to execute. This “method” of executing is implemented as a query plan and is the output from the optimizer.

3. Query Executor:

    The Query Executor’s job is self-explanatory; it executes the query. To be more specific, it executes the query plan by working through each step it contains and interacting with the Storage Engine to retrieve or modify data.

The Storage Engine:

    The Storage engine is responsible for managing all I/O to the data, and contains the Access Methods code, which handles I/O requests for rows, indexes, pages, allocations and row versions, and a Buffer Manager, which deals with SQL Server’s main memory consumer, the buffer pool. It also contains a Transaction Manager, which handles the locking of data to maintain Isolation (ACID properties) and manages the transaction log.

1. Access Methods:
    Access Methods is a collection of code that provides the storage structures for data and indexes as well as the interface through which data is retrieved and modified. It contains all the code to retrieve data but it doesn’t actually perform the operation itself; it passes the request to the Buffer Manager.

2. Buffer Manager:
    The Buffer Manager manages the buffer pool, which represents the majority of SQL Server’s memory usage. If you need to read some rows from a page the Buffer Manager will check the data cache in the buffer pool to see if it already has the page cached in memory. If the page is already cached, then the results are passed back to the Access Methods. If the page is n’t already in cache, then the Buffer Manager will get the page from the database on disk, put it in the data cache, and pass the results to the Access Methods.
Data Cache: The data cache is usually the largest part of the buffer pool; therefore, it’s the largest memory consumer within SQL Server. It is here that every data page that is read from disk is written to before being used.

3. Transaction Manager:
The Transaction Manager has two components that are of interest here: a Lock Manager and a Log Manager. The Lock Manager is responsible for providing concurrency to the data, and it delivers the configured level of isolation by using locks. The Access Methods code requests that the changes it wants to make are logged, and the Log Manager writes the changes to the transaction log. This is called Write-Ahead Logging.

Buffer Pool:
The buffer pool provides the pages to the Buffer Manager to reduce I/O operations. Buffer pool contains Data Cache and Plan Cache.
1. Data Cache:
The data sent from the Buffer manager is stored in Data Cache as Dirty Page. When a checkpoint is generated this dirty pages will be sent to the Data files. The copy will be saved in Data Cache for future requests.
2. Plan Cache:
Plan Cache stores the plans for the data stored in Data Cache for easy mapping.

How to work SQL Statements in SQL Server Life Cycle:

A Basic select Statement Life Cycle Summary

Figure 1-5 shows the whole life cycle of a SELECT query, described here:

1. The SQL Server Network Interface (SNI) on the client established a connection to the SNI
on the SQL Server using a network protocol such as TCP/IP. It then created a connection to a TDS endpoint over the TCP/IP connection and sent the SELECT statement to SQL Server as a TDS message.

2. The SNI on the SQL Server unpacked the TDS message, read the SELECT statement, and passed a “SQL Command” to the Command Parser.

3. The Command Parser checked the plan cache in the buffer pool for an existing, usable query plan. When it did n’t find one, it created a query tree based on the SELECT statement and passed it to the Optimizer to generate a query plan.

4. The Optimizer generated a “zero cost” or “trivial” plan in the pre-optimization phase because the statement was so simple. The query plan created was then passed to the Query Executor for execution.

5. At execution time, the Query Executor determined that data needed to be read to complete
the query plan so it passed the request to the Access Methods in the Storage Engine via an OLE DB interface.

6. The Access Methods needed to read a page from the database to complete the request from the Query Executor and asked the Buffer Manager to provision the data page.

7. The Buffer Manager checked the data cache to see if it already had the page in cache. It was n’t in cache so it pulled the page from disk, put it in cache, and passed it back to the Access Methods.

8. Finally, the Access Methods passed the result set back to the Relational Engine to send to the client.


A Basic Update Statement Life Cycle Summary (DML Operations)

The process is exactly the same as the process for the SELECT statement you just looked at until you get to the Access Methods.

The Access Methods need to make a data modification this time, so before it passes on the I/O request the details of the change need to be persisted to disk. That is the job of the Transaction Manager. The details about the transaction are stored in the LDF file with the help of Log Manager, also called as WAL.

Lock manager forms an Exclusive lock based on the object scope, Then Page is pulled into the memory and then it is modified, such a modified page is called Dirty Page.


Checkpoint Process

A checkpoint is a point in time created by the checkpoint process at which SQL Server can be sure that any committed transactions have had all their changes written to disk. This checkpoint then becomes the marker from which database recovery can start.

The checkpoint process ensures that any dirty pages associated with a committed transaction will be flushed to disk. Unlike the lazywriter, however, a checkpoint does not remove the page from cache; it makes sure the dirty page is written to disk and then marks the cached paged as clean in the page header.

By default, on a busy server, SQL Server will issue a checkpoint roughly every minute, which is marked in the transaction log. If the SQL Server instance or the database is restarted, then the recovery process reading the log knows that it does n’t need to do anything with log records prior tothe checkpoint.

The time between checkpoints therefore represents the amount of work that needs to be done to roll forward any committed transactions that occurred after the last checkpoint, and to roll back any transactions that hadn’t committed.

By check-pointing every minute, SQL Server is trying to keep the recovery time when starting a database to less than one minute, but it won’t automatically checkpoint unless at least 10MB has been written to the log within the period.

Checkpoints can also be manually called by using the CHECKPOINT T-SQL command, and can occur because of other events happening in SQL Server. For example, when you issue a backup command, a checkpoint will run first.
Trace flag 3502 is an undocumented trace flag that records in the error log when a checkpoint starts and stops. For example,after adding it as a startup trace flag and running a workload with numerous writes, my error log contained the entries showing Figure 1-8, which indicates checkpoints running between 30 and 40 seconds apart.

Lazywriter

The lazywriter is a thread that periodically checks the size of the free buffer list. When it’s low, it scans the whole data cache to age-out any pages that haven’t been used for a while. If it finds any dirty pages that haven’t been used for a while, they are flushed to disk before being marked as free in memory.


The lazywriter also monitors the free physical memory on the server and will release memory from the free buffer list back to Windows in very low memory conditions. When SQL Server is busy, it will also grow the size of the free buffer list to meet demand (and therefore the buffer pool) when there is free physical memory and the configured Max Server Memory threshold has n’t been reached.

1 comment: