SQL Server architecture is mainly divided into different
components.
1. SNI Protocol Layer.
2. Relational Engine.
3. Storage Engine
4. Buffer
Pool.
Majorly classified as two main engines: Relational Engine and the Storage
engine.
SNI (SQL Server Network Interface)
The SQL Server Network Interface (SNI) is a protocol
layer that establishes the network connection between the client and the
server. It consists of a set of APIs that are used by both the database engine
and the SQL Server Native Client (SNAC).
SNI isn’t configurable directly; you just need to configure
a network protocol on the client and the server. SQL Server has support for the
following protocols:
➤Shared memory: Simple
and fast, shared memory is the default protocol used to connect from a client
running on the same computer as SQL Server. It can only be used locally, has no
configurable properties, and is always tried first when connecting from the
local machine.
➤TCP/IP: TCP/IP
is the most commonly used access protocol for SQL Server. It enables you to
connect to SQL Server by specifying an IP address and a port number. Typically,
this happens automatically when you specify an instance to connect to. Your
internal name resolution system resolves the hostname part of the instance name
to an IP address, and either you connect to the default TCP port number 1433
for default instances or the SQL Browser service will find the right port for a
named instance using UDP port 1434.
➤Named Pipes: TCP/IP
and Named Pipes are comparable protocols in the architectures in which they can
be used. Named Pipes was developed for local area networks (LANs) but it can be
inefficient across slower networks such as wide area networks (WANs). Named
Pipes uses TCP port 445, so ensure that the port is open on any firewalls
between the two computers, including the Windows Firewall.
➤VIA: Virtual
Interface Adapter is a protocol that enables high-performance communications between
two systems. It requires specialized hardware at both ends and a dedicated connection.
Like Named Pipes, to use the VIA protocol you first
need to enable it in SQL Server Configuration manager and then create a SQL Server alias
that connects to the server using VIA as the protocol.
TDS (Tabular
Data Stream) Endpoints
TDS is a Microsoft-proprietary
protocol originally designed by Sybase that is used to interact with a database
server. Once a connection has been made using a network protocol such as
TCP/IP, a link is established to the relevant TDS endpoint that then acts as
the communication point between the client and the server.
There is one TDS endpoint for each
network protocol and an additional one reserved for use by the dedicated
administrator connection (DAC). Once connectivity is established, TDS messages
are used to communicate between the client and the server.
The SELECT statement is sent to the SQL Server as a TDS
message across a TCP/IP connection
(TCP/IP is the default protocol).
Protocol Layer
When the protocol layer in SQL
Server receives your TDS packet, it has to reverse the work of the SNI at the
client and unwrap the packet to find out what request it contains. The protocol
layer is also responsible for packaging up results and status messages to send
back to the client as TDS messages.
Our SELECT statement is marked in
the TDS packet as a message of type “SQL Command,” so it’s passed on to the
next component, the Query Parser, to begin the path toward execution.
At the client, the statement was wrapped in aTDS packet
by the SQL Server Network Interface and sent to the protocol layer on the SQL
Serverwhere it was unwrapped, identified as a SQL Command, and the code sent to
the Command Parserby the SNI.
The Relational Engine:
The Relational Engine is also
sometimes called the query processor because its primary function is query
optimization and execution. It contains a Command Parser to check query syntax
and prepare query trees, a Query Optimizer that is arguably the crown jewel of
any database system, and a Query Executor responsible for execution.
1. Command
Parser:
The Command
Parser’s role is to handle T-SQL language events. It first checks the syntax
and returns any errors back to the protocol layer to send to the client. If the
syntax is valid, then the next step is to generate a query plan or find an
existing plan. A Query plan contains the details about how SQL Server is going
to execute a piece of code. It is commonly referred to as an execution plan.
Plan Cache:
Creating execution plans can be time consuming and resource intensive, so The
Plan Cache, part of SQL Server’s buffer pool, is used to store execution plans
in case they are needed later. LRU algorithm is used for all the plan caches in
the Buffer Pool, all oldest plans will be flushed out if they cross LRU
timeframe.
2. Query
Optimizer:
The Query Optimizer is one of the most
complex and secretive parts of the product. It is what’s known as a
“cost-based” optimizer, which means that it evaluates multiple ways to execute
a query and then picks the method that it deems will have the lowest cost to
execute. This “method” of executing is implemented as a query plan and is the
output from the optimizer.
3. Query
Executor:
The Query Executor’s job is
self-explanatory; it executes the query. To be more specific, it executes the
query plan by working through each step it contains and interacting with the
Storage Engine to retrieve or modify data.
The Storage Engine:
The Storage engine is responsible for
managing all I/O to the data, and contains the Access Methods code, which
handles I/O requests for rows, indexes, pages, allocations and row versions,
and a Buffer Manager, which deals with SQL Server’s main memory consumer, the
buffer pool. It also contains a Transaction Manager, which handles the locking
of data to maintain Isolation (ACID properties) and manages the transaction
log.
1. Access
Methods:
Access Methods is a collection of code that
provides the storage structures for data and indexes as well as the interface
through which data is retrieved and modified. It contains all the code to
retrieve data but it doesn’t actually perform the operation itself; it passes
the request to the Buffer Manager.
2. Buffer
Manager:
The Buffer Manager manages the buffer pool,
which represents the majority of SQL Server’s memory usage. If you need to read
some rows from a page the Buffer Manager will check the data cache in the buffer
pool to see if it already has the page cached in memory. If the page is already
cached, then the results are passed back to the Access Methods. If the page
is n’t already in cache, then the Buffer Manager will get the page from the
database on disk, put it in the data cache, and pass the results to the Access
Methods.
Data Cache:
The data cache is usually the largest part of the buffer pool; therefore, it’s
the largest memory consumer within SQL Server. It is here that every data page
that is read from disk is written to before being used.
3.
Transaction Manager:
The
Transaction Manager has two components that are of interest here: a Lock
Manager and a Log Manager. The Lock
Manager is responsible for providing concurrency to the data, and it delivers
the configured level of isolation by using locks. The Access Methods code
requests that the changes it wants to make are logged, and the Log Manager
writes the changes to the transaction log. This is called Write-Ahead Logging.
Buffer Pool:
The buffer
pool provides the pages to the Buffer Manager to reduce I/O operations. Buffer
pool contains Data Cache and Plan Cache.
1. Data
Cache:
The data
sent from the Buffer manager is stored in Data Cache as Dirty Page. When a
checkpoint is generated this dirty pages will be sent to the Data files. The
copy will be saved in Data Cache for future requests.
2. Plan
Cache:
Plan Cache
stores the plans for the data stored in Data Cache for easy mapping.
How to work SQL Statements in SQL Server Life Cycle:
A
Basic select Statement Life Cycle Summary
Figure 1-5 shows the
whole life cycle of a SELECT query, described here:
1.
The SQL Server Network Interface (SNI) on the client established a
connection to the SNI
on the SQL Server using
a network protocol such as TCP/IP. It then created a connection to a TDS endpoint over the
TCP/IP connection and sent the SELECT statement to SQL Server as a TDS message.
2.
The SNI on the SQL Server unpacked the TDS message, read the SELECT
statement, and passed a “SQL Command”
to the Command Parser.
3.
The Command Parser checked the plan cache in the buffer pool for an
existing, usable query plan. When it
did n’t find one, it created a query tree based on the SELECT statement and passed it to the
Optimizer to generate a query plan.
4.
The Optimizer generated a “zero cost” or “trivial” plan in the
pre-optimization phase because the statement
was so simple. The query plan created was then passed to the Query Executor for
execution.
5.
At execution time, the Query Executor determined that data needed to
be read to complete
the query plan so it
passed the request to the Access Methods in the Storage Engine via an OLE DB interface.
6.
The Access Methods needed to read a page from the database to
complete the request from the Query Executor and
asked the Buffer Manager to provision the data page.
7.
The Buffer Manager checked the data cache to see if it already had
the page in cache. It was n’t in cache so it
pulled the page from disk, put it in cache, and passed it back to the Access Methods.
8.
Finally, the Access Methods passed the result set back to the
Relational Engine to send to the client.
A Basic Update Statement Life Cycle Summary (DML Operations)
The process is exactly
the same as the process for the SELECT statement you just looked at until you
get to the Access Methods.
The Access Methods need
to make a data modification this time, so before it passes on the I/O request the details of
the change need to be persisted to disk. That is the job of the Transaction Manager. The details
about the transaction are stored in the LDF file with the help of Log Manager,
also called as WAL.
Lock manager forms an
Exclusive lock based on the object scope, Then Page is pulled into the memory
and then it is modified, such a modified page is called Dirty Page.
Checkpoint
Process
A
checkpoint is a point in time created by the checkpoint process at which SQL
Server can be sure that any committed transactions have had all their changes written to disk. This
checkpoint then becomes the marker from which database recovery can start.
The
checkpoint process ensures that any dirty pages associated with a committed
transaction will be flushed to disk. Unlike the lazywriter, however, a
checkpoint does not remove the page from cache; it makes sure the dirty page is
written to disk and then marks the cached paged as clean in the page header.
By
default, on a busy server, SQL Server will issue a checkpoint roughly every
minute, which is marked in the transaction log. If the SQL Server instance or
the database is restarted, then the recovery process reading the log knows that
it does n’t need to do anything with log records prior tothe checkpoint.
The
time between checkpoints therefore represents the amount of work that needs to
be done to roll forward any committed transactions that occurred after the last
checkpoint, and to roll back any
transactions that hadn’t committed.
By
check-pointing every minute, SQL Server is trying to keep the recovery time
when starting a database to less than one minute, but it won’t automatically
checkpoint unless at least 10MB has been
written to the log within the period.
Checkpoints
can also be manually called by using the CHECKPOINT
T-SQL command, and can occur because of other
events happening in SQL Server. For example, when you issue a backup command, a
checkpoint will run first.
Trace
flag 3502 is an undocumented trace flag that records in the error log when a checkpoint
starts and stops. For example,after adding it as a startup trace flag and running
a workload with numerous writes, my error log contained the entries showing Figure 1-8, which indicates checkpoints running between 30 and 40 seconds apart.
Lazywriter
The
lazywriter is a thread that periodically checks the size of the free buffer
list. When it’s low, it scans the whole data cache to age-out any pages that
haven’t been used for a while. If it finds any dirty pages that haven’t been
used for a while, they are flushed to disk before being marked as free in
memory.
The
lazywriter also monitors the free physical memory on the server and will
release memory from the free buffer list back to Windows in very low memory
conditions. When SQL Server is busy, it will also grow the size of the free
buffer list to meet demand (and therefore the buffer pool) when there is free
physical memory and the configured Max Server Memory threshold has n’t been
reached.
nice blog thank you for sharing
ReplyDeleteSQL Server DBA Online Training