Tuesday, July 28, 2009

Physical Data structures of SQL Server

SQL Server data is organized into files. When we create a database,
we will identify the names and locations of these files. Creating a database requires the
use of two types of files: data files and transaction log files. A database can have multiple data
files and transaction log files, and it is required to have at least one of each.
Data Files Data files store all of the information in the database including data values,
indexes, and even configuration data and programmatic objects such as stored procedures
and functions. The data files are, therefore, the main repository in the database.
Transaction Log Files Log files provide a durable record of all modifications to the database.
SQL Server uses a write-ahead modification process that requires all changes to be
written to transaction logs before they are written to the database. These logs enable a variety
of data recovery and integrity features.

To allow SQL Server to locate any data value quickly and efficiently, these data files must be
organized in a way that gives the SQL Server data engine full control of the specific placement
of any data value. SQL Server accomplishes this by organizing data file structures into 8KB
blocks called pages. It means that SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.
Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Large Row Support
Rows cannot span pages, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. This restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns.

Data Pages Data pages store all data values except those typed as large value types such
as text, ntext, xml, varchar(max), etc. The majority of pages in a typical database will be
data pages.
Index Pages Index pages store index entries for both clustered and nonclustered indexes.
These entries typically consist of the index key values themselves plus additional information
used by SQL Server to locate data entries and manage the index process.
Large Object Pages Because SQL Server data rows cannot span multiple data pages, large data
values must be stored in alternative locations in cases where the data value is too large for the
data page. Datatypes such as text, ntext, xml, varchar(max), etc. will typically store a small
pointer on the data page that references the large object page where the data entry begins.
Some datatypes such as text and varchar(max) will always be stored in large object pages.
Others, such as varchar and varbinary, will be moved to these pages dynamically by SQL
Server when the data row size exceeds 8KB. If, in the future, the variable length columns are
edited and the size of the data row falls within the 8KB limit again, the data will be moved
back to the data page.
Other Pages Additionally, SQL Server stores other configuration information in special
pages designed specifically for those configurations. These special page types include the
Global Application Map, Index Application Map, Bulk Changed Map, and others.

Extents
Pages are further organized into 64KB blocks called extents. An extent is eight physically contiguous pages. This means SQL Server databases have 16 extents per megabyte. Typically, pages
in a data file are allocated specifically for only one purpose. For example, you will not see data
values and index entries in the same data page. SQL Server has many kinds of pages.

To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:
Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.

A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

Wednesday, July 22, 2009

Difference between Primary Key and Unique Index

What is the difference between a Primary Key and a Unique Index? Both can be declared on one or more columns, both can be used to enforce foreign keys (if the unique index is on not null column(s)), both can be declared as clustered/non clustered indexes (SQL Server lingo), both can be used on computed columns as well (SQL Server).

The differences between the two are:

  1. Column(s) that make the Primary Key of a table cannot be NULL since by definition, the Primary Key cannot be NULL since it helps uniquely identify the record in the table. The column(s) that make up the unique index can be nullable. A note worth mentioning over here is that different RDBMS treat this differently –> while SQL Server and DB2 do not allow more than one NULL value in a unique index column, Oracle allows multiple NULL values. That is one of the things to look out for when designing/developing/porting applications across RDBMS.
  2. There can be only one Primary Key defined on the table where as you can have many unique indexes defined on the table (if needed).
  3. Also, in the case of SQL Server, if you go with the default options then a Primary Key is created as a clustered index while the unique index (constraint) is created as a non-clustered index. This is just the default behavior though and can be changed at creation time, if needed.

So, if the unique index is defined on not null column(s), then it is essentially the same as the Primary Key and can be treated as an alternate key meaning it can also serve the purpose of identifying a record uniquely in the table.

Tuesday, July 21, 2009

Sql Server and Database Posts

Post Sql Server and Database related posts here.

Share This