Tuesday, July 28, 2009

Physical Data structures of SQL Server

SQL Server data is organized into files. When we create a database,
we will identify the names and locations of these files. Creating a database requires the
use of two types of files: data files and transaction log files. A database can have multiple data
files and transaction log files, and it is required to have at least one of each.
Data Files Data files store all of the information in the database including data values,
indexes, and even configuration data and programmatic objects such as stored procedures
and functions. The data files are, therefore, the main repository in the database.
Transaction Log Files Log files provide a durable record of all modifications to the database.
SQL Server uses a write-ahead modification process that requires all changes to be
written to transaction logs before they are written to the database. These logs enable a variety
of data recovery and integrity features.

To allow SQL Server to locate any data value quickly and efficiently, these data files must be
organized in a way that gives the SQL Server data engine full control of the specific placement
of any data value. SQL Server accomplishes this by organizing data file structures into 8KB
blocks called pages. It means that SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.
Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Large Row Support
Rows cannot span pages, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. This restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns.

Data Pages Data pages store all data values except those typed as large value types such
as text, ntext, xml, varchar(max), etc. The majority of pages in a typical database will be
data pages.
Index Pages Index pages store index entries for both clustered and nonclustered indexes.
These entries typically consist of the index key values themselves plus additional information
used by SQL Server to locate data entries and manage the index process.
Large Object Pages Because SQL Server data rows cannot span multiple data pages, large data
values must be stored in alternative locations in cases where the data value is too large for the
data page. Datatypes such as text, ntext, xml, varchar(max), etc. will typically store a small
pointer on the data page that references the large object page where the data entry begins.
Some datatypes such as text and varchar(max) will always be stored in large object pages.
Others, such as varchar and varbinary, will be moved to these pages dynamically by SQL
Server when the data row size exceeds 8KB. If, in the future, the variable length columns are
edited and the size of the data row falls within the 8KB limit again, the data will be moved
back to the data page.
Other Pages Additionally, SQL Server stores other configuration information in special
pages designed specifically for those configurations. These special page types include the
Global Application Map, Index Application Map, Bulk Changed Map, and others.

Extents
Pages are further organized into 64KB blocks called extents. An extent is eight physically contiguous pages. This means SQL Server databases have 16 extents per megabyte. Typically, pages
in a data file are allocated specifically for only one purpose. For example, you will not see data
values and index entries in the same data page. SQL Server has many kinds of pages.

To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:
Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.

A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

2 comments:

  1. Excellent post! I had always wondered about what was in those .mdf and .log files. The database is one of my weaker areas but I at least feel that I've strengthened my weakest link.

    ReplyDelete
  2. suppose i want to see the table of my database in C:. from where i can see the mdb or mdf files.

    ReplyDelete

Share This