<html>
<head>
<title>Albert van der Sel - BLOBS in SQL Server</title>
</head>
<body bgcolor="#FFFFFF" link="blue" alink="blue" vlink="blue">

<h1>Simple (and incomplete) note about storage of BLOBS in SQL Server</h1>

<B>Version</B>	    : 0.8<br>
<B>Date</B>		: 02/12/2012<br>
<B>By</B>		    : Albert van der Sel<br>
<hr/>

<font face="arial" size=2 color="black">

<br>
<br>

We all know that Relational Databases, in the "past", typically were used to store <B>traditional</B> "administrative"<br>
and "business-like" data like Customer names, addresses, ordernumbers, prices, amounts, product id's etc..<br>
<br>
So, the common SQL Server <I>datatypes</I> in use are simply characterbased or nummeric, like "char", "varchar", "int",<br>
"nummeric" (or decimal), "datetime" and the like.<br>
These datatypes still are very important (and much used ofcourse), but the later SQL Server versions were<br>
<I>progressively</I> better equiped to handle "binary" data. <br>
<br>
This "binary" data could be very diverse: .pdf files, excel sheets, images, video's, and completely unstructured<br>
data as well. Usually, people call that type of data "BLOBs" or Binary Large Objects.<br>
<br>
It sounds a bit strange: If we just look at the physical implementation, SQL Server uses pages of 8192 bytes,<br>
where tablerows are stored in (details will follow soon!).<br>
So, we can easily imagine that just characters and some nummeric data can be easily stored in such a page.<br>
Now what happens if a 5MB image is stored? As we will see, SQL Server (in case of "inline" storage) will simply store<br>
a pointer in the first page, that will point to a whole seperate "tree" of pages storing this blob data.<br>
<br>
So, problem solved? We have not seen any details yet, but the topic remains lively discussed among<br>
various SQL Server experts. Fact is, that a true relational database does not seem to be very good in<br>
storing and handling traditional data together with BLOBs.<br>
<br>
On the other hand: the business has changed. Many applications wants to show their customers images and movies of their<br>
products and services. And the best way to store them seems to be a database.<br>
And, Microsoft responded quite well. What you can store in SQL Server nowaydays, using SQL 2008, or even better,<br>
SQL 2012, is amazing.<br>
<br>
There exists at least the following options to store binary data in SQL Server:<br>

<font face="arial" size=2 color="blue">
<ul>
<li>Inline storage, where the blobs really occupy internal database pages.</li>
<li>External storage, where the files just reside on a filesystem, but where metadata exists in SQL Server (like pointers).</li>
<li>External storage using additional (third party) providers, but where metadata exists in SQL Server (like pointers).</li>
<li>Blobs in a socalled "Filestream" filegroup, which is actually a folder on a filesystem (2008).</li>
<li>Usage of socalled "Filetables" (2012)</li>
</ul>
<font face="arial" size=2 color="black">
<br>
In this note, we are trying to explore some stuff on BLOBs: how BLOBs are stored, how to load them,<br>
how to retrieve them (using TSQL and other programmatic interfaces), and hopefully some more interesting facts.<br>
<br>
First, we will explore the "traditional" inline storage, and work our way through the other options later on.<br>
<br>
<B>Hopefully, you will like this simple note.....</B><br>
<br>
<br>
<br>

<font face="arial" size=2 color="blue">

<br>
<B><U>Main Contents:</U></B><br>
<br>

<B>Chapter 1. Inline storage of a blob.</B><br>
&nbsp 1.1 Some preparations first (create a database etc..).<br>
&nbsp 1.2 Adding a blob to a table.<br>
&nbsp 1.3 Analysis of the (inline) blob storage.<br>
<br>
<B>Chapter 2. Pages, extends, and other structures.</B><br>
&nbsp 2.1 Structure of a data page, and special pages.<br>
&nbsp 2.2 Extents.<br>
&nbsp 2.3 System pages.<br>
<br>
<B>Chapter 3. Using Filestream for storage of blobs.</B><br>
&nbsp 3.1 Enabling "Filestream".<br>
&nbsp 3.2 Implementing "Filestream".<br>
&nbsp 3.3 Adding a blob.<br>
&nbsp 3.4 Analysis of blob storage.<br>
<br>
<B>Chapter 4. Some methods for storage and retrieval of blobs. (not ready)</B><br>
<br>

<br>
<br>
<br>
<font face="arial" size=2 color="blue">
<h2>Chapter 1. Inline storage of a blob.</h2>
<font face="arial" size=2 color="black">

Here we take a quick look on how we can "load" a blob (in this case, a .jpg file) into a SQL Server database.<br>
In this chapter, we will investigate "inline" storage, where the blob is stored on internal database pages.<br>
<br>

<h3>1.1 Some preparations first (create a database etc..).</h2>


<br>
Now, we will do a practical example. Hopefully, you have sql server installed (like 2008 or 2012), and you<br>
are invited to follow along.<br>
<br>
First, we will create a simple database. However, it is composed of several files. This is so because I want<br>
to prevent to store our objects in the "primary" .mdf file.<br>
Actually having multiple (data) files is good practice, and we should leave the .mdf file for the dictionary.<br>
<br>
All you need to do is, is to create a folder "c:\mssql\data" on your testsystem.<br>
Next, using Management Studio, logon to SQL Server as sa or as an Administrator, and run the following script:<br>
<br>


<font face="courier" size=2 color="blue">

<B>-- 1. Create the database:</B><br>
<br>
create database SALES<br>
on PRIMARY<br>
(<br>
name='SALES',<br>
filename='c:\mssql\data\SALES.mdf',<br>
size=40MB,<br>
filegrowth= 10MB,<br>
maxsize= 100MB<br>
),<br>
FILEGROUP SALESDATA01<br>
(<br>
name='SALES_DATA_01',<br>
filename='c:\mssql\data\SALES_DATA_01.ndf',<br>
size= 40MB,<br>
filegrowth= 10MB,<br>
maxsize= 100MB<br>
),<br>
FILEGROUP SALESINDEX01<br>
(<br>
name='SALES_INDEX_01',<br>
filename='c:\mssql\data\SALES_INDEX_01.ndf',<br>
size= 40MB,<br>
filegrowth= 10MB,<br>
maxsize= 100MB<br>
)<br>
LOG ON<br>
(<br>
name='SALES_LOG_001',<br>
filename='c:\mssql\data\SALES_LOG_001.ldf',<br>
size= 40MB,<br>
filegrowth= 10MB,<br>
maxsize= 100MB<br>
)<br>
<br>
ALTER DATABASE SALES<br>
MODIFY FILEGROUP SALESDATA01 DEFAULT<br>
GO<br>
<br>
USE SALES<br>
GO<br>
<br>
<br>
<B>-- Create a sample table:</B><br>
<br>
CREATE TABLE dbo.EMPLOYEE<br>
(<br>
EMPID     INT          NOT NULL,<br>
EMPNAME   VARCHAR(20)  NOT NULL,<br>
EMPSALARY DECIMAL(7,2),<br>
EMPPHOTO  VARBINARY(MAX)<br>

)<br>
ON [SALESDATA01]<br>
<br>
-- Note the traditional datatypes like INT, VARCHAR, and DECIMAL,<br>
-- as well as the datatype "VARBINARY" for BLOBs.<br>
<br>
<br>
<B>-- Insert some characterbased data (no BLOBs yet) into the EMPLOYEE table:</B><br>
<br>
insert into EMPLOYEE<br>
(EMPID,EMPNAME,EMPSALARY)<br>
values<br>
(1,'Harry',2000.50)<br>
<br>
insert into EMPLOYEE<br>
(EMPID,EMPNAME,EMPSALARY)<br>
values<br>
(2,'Nadia',3000.00)<br>
<br>
insert into EMPLOYEE<br>
(EMPID,EMPNAME,EMPSALARY)<br>
values<br>
(3,'Albert',5000.00)<br>
<br>
GO
<br>

<br>

<font face="arial" size=2 color="black">

So, we have created a database, and an EMPLOYEE table in that database. This table has three simple<br>
character- or nummeric columns, for storing data like "employee name" (EMPNAME).<br>
Note however, that the last column is of type VARBINARY, which means that SQL Server is now prepared<br>
to store "binary" data (like .jpg or .pdf etc..) in that column. Soon we will see details on that.<br>
<br>

Then, we inserted 3 rows into that table, filling the first 3 columns, leaving the EMPPHOTO to be 'null' for now.<br>
Obviously, the EMPPHOTO column is supposed to store a photo (a blob) of an Employee.<br>
<br>
<font face="arial" size=2 color="brown">
Note: <br>
<br>
in older SQL Server versions, binary data could be stored using the "image" datatype.<br>
Although this datatype is still available, the "varbinary()" or "varbinary(max)" datatypes should be used for storing blobs.<br>
First, "image" might get depreciated (it is, but it's still around), and varbinary is much better in<br>
scenarios where space must be reclaimed if blobs are deleted or updated. The varbinary, is indeed of<br>
variable length.<br>
<br>

<font face="arial" size=2 color="black">

Let's find out how the table is physically stored.<br>
<br>
If you created the database exactly as shown in the script above, then I am sure that our table is "page no 8"<br>
in the file 'c:\mssql\data\SALES_DATA_01.ndf'.<br>
<br>
This is so because the first couple of pages of any datafile, are for administrative purposes (for SQL itself),<br>
like the fileheader (page 0), the "Page Free Space (PSF) page" (page 1) etc..<br>
And, when the database was created, we told SQL Server that the filegroup "SALESDATA01" (consisting of 'c:\mssql\data\SALES_DATA_01.ndf')<br>
to be the DEFAULT filegroup for new objects.<br>
<br>
If you did not used the script, or used an existing Test database, the EMPLOYEE table is on different pages.<br>
Anyway, if that is true, you can still follow the next "experiment":<br>
<br>
We are going to use the DBCC PAGE command to dump page contents. Just follow along...<br>
<br>

In order to get "full output" from the DBCC PAGE command, let's first tell SQL Server to do so.<br>
<br>

<font face="courier" size=2 color="blue">

DBCC TRACEON (3604)<br>
GO<br>
<br>

<font face="arial" size=2 color="black">

The DBCC PAGE() statement, uses some parameters. These parameters are nothing else than pure logical,<br>
since the parameters just tell SQL Server the complete address of the page: that is, <I>which database, the file id in that database,<br>
the page number in that file, and output mode (printoption)</I>.<br>
So, it's like this:<br>
<br>
<font face="courier" size=2 color="blue">
DBCC PAGE (databasename, file id, page no, modus)<br>
<br>
<font face="arial" size=2 color="black">
Now, we already know the database name, the page number, and the modus we want (3 to let it be "verbose").<br>
<br>
So, maybe we are not sure about the file_id where the page is stored on. Well, let's simply take a look<br>
in the systemview "sysfiles".<br>
<br>
<font face="courier" size=2 color="blue">
USE SALES<br>
GO<br>
<br>

select fileid, filename from sysfiles<br>

<br>
fileid filename<br> 
<br> 
1      c:\mssql\data\SALES.mdf<br> 
2      c:\mssql\data\SALES_LOG_001.ldf<br> 
3      c:\mssql\data\SALES_DATA_01.ndf<br> 
4      c:\mssql\data\SALES_INDEX_01.ndf<br> 
<br>
<font face="arial" size=2 color="black">
Since we know that the EMPLOYEE table is stored on the default filegroup, which consists of the<br>
c:\mssql\data\SALES_DATA_01.ndf file, we now know that the file id=3. <br>
<br>
The way most entries in logs tell you about pages, is like this example: "3:55", meaning<br>
page 55 in file 3.<br>
<br>
Now, lets dump the page:<br>
<br>
<font face="courier" size=2 color="blue">

DBCC PAGE('sales',3,8,3)<br> 
<br>
output:
<br>
...<br>
some output skipped<br>
...<br>
<B>EMPID = 3</B>                            <br> 
Slot 2 Column 2 Offset 0x14 Length 6 Length (physical) 6<br> 
<B>EMPNAME = Albert</B>                      <br> 
Slot 2 Column 3 Offset 0x8 Length 5 Length (physical) 5<br> 
<B>EMPSALARY = 5000.00</B>                   <br> 
Slot 2 Column 4 Offset 0x0 Length 0 Length (physical) 0<br> 
<B>EMPPHOTO = [NULL] </B><br> 
<br>
<font face="arial" size=2 color="black">
Above, much output was skipped. Here,  we see all fields of the third row.<br>
<br>
Since our table is still extremely small (it only has 3 rows), all of it "sits" in one page.<br>
Let's doublecheck that with the following.<br>
<br>
<font face="courier" size=2 color="blue">
DBCC SHOWCONTIG(EMPLOYEE)<br>
<br>
DBCC SHOWCONTIG scanning 'EMPLOYEE' table...<br>
Table: 'EMPLOYEE' (2105058535); index ID: 0, database ID: 8<br>
TABLE level scan performed.<br>
<B>- Pages Scanned................................: 1</B><br>
- Extents Scanned..............................: 1<br>
- Extent Switches..............................: 0<br>
- Avg. Pages per Extent........................: 1.0<br>
- Scan Density [Best Count:Actual Count].......: 100.00% [1:1]<br>
- Extent Scan Fragmentation ...................: 0.00%<br>
- Avg. Bytes Free per Page.....................: 8014.0<br>
- Avg. Page Density (full).....................: 0.99%<br>
<br>
<font face="arial" size=2 color="black">
As you can see from the output above, only one page needed to be scanned, so the EMPLOYEE table<br>
just sits completely in page 8 of file c:\mssql\data\SALES_DATA_01.ndf.<br>
<br>
<br>
<h3>1.2 Adding a blob in the EMPLOYEE table.</h2>

Up to this point, we only have "simple" data in our EMPLOYEE table like varchar, nummeric,<br>
but no blob yet.<br>
<br>
<font face="courier" size=2 color="blue">
SELECT * FROM EMPLOYEE<br>
<br>
EMPID...EMPNAME...EMPSALARY..EMPPHOTO<br>
1.......Harry.....2000.50....NULL<br>
2.......Nadia.....3000.00....NULL<br>
3.......Albert....5000.00....NULL<br>
<br>
<font face="arial" size=2 color="black">
<br>
Let's update the third record, and store a binary file in the EMPPHOTO column, that is, we will place<br>
a photo of Albert (yuk!) into the EMPLOYEE table, and then see what has changed.<br>
<br>
There are many functions in SQL Server, for import/export of data. The OPENROWSET() function can also<br>
be used to load a blob into a table. So let's try that. Suppose in C:\TEMP, we have the photo "albert.jpg".<br>
<br>
<B>TSQL statement for adding a BLOB using OPENROWSET():</B><br>
<br>
<font face="courier" size=2 color="brown">
<B>
UPDATE EMPLOYEE <br>
SET EMPPHOTO = <br>
   (SELECT * FROM OPENROWSET (BULK 'C:\TEMP\albert.jpg', SINGLE_BLOB) a) <br>
WHERE EMPID=3<br>
</B>
<br>
<font face="courier" size=2 color="blue">
SELECT * FROM EMPLOYEE<br>
<br>
EMPID...EMPNAME...EMPSALARY..EMPPHOTO<br>
1.......Harry.....2000.50....NULL<br>
2.......Nadia.....3000.00....NULL<br>
3.......Albert....5000.00....0xFFD8FFE000104A4649...<br>
<br>
<font face="arial" size=2 color="black">
<br>
Ok, we see a pointerlike field in the EMPPHOTO column where EMPID=3, but how is the blob (albert.jpg) stored?<br>
<br>
<br>
<h3>1.3 Analysis of the (inline) blob storage.</h2>


We know that the EMPLOYEE table is stored in page 8. Now, there are some special pages in any database file,<br>
at various locations, but it's reasonable to expect that SQL Server will not store the blob, say, starting at page 1298,<br>
or page 633 or something. No, it should be quite"close", so lets play with DBCC PAGE('sales',3,x,3), where<br>
we take a look at page no 8,9,10,11 and a few other pages "nearby".<br>
<br>
Now its getting interesting....<br>
<br>
First let look at page 8 again, at the data related to EMPID=3 (Albert).<br>
<br>
<font face="courier" size=2 color="blue">
<br>
EMPID = 3   <br>                         
Slot 2 Column 2 Offset 0x16 Length 6 Length (physical) 6<br>
EMPNAME = Albert   <br>                  
Slot 2 Column 3 Offset 0x8 Length 5 Length (physical) 5<br>
EMPSALARY = 5000.00     <br>             
EMPPHOTO = [BLOB Inline Root] Slot 2 Column 4 Offset 0x1c Length 24 Length (physical) 24 (more stuff..)<br>
<br>
<font face="arial" size=2 color="black">
Interesting. At the EMPPHOTO column it now says "[BLOB Inline Root]", meaning that SQL Server stored<br>
the binary file inline, that is, <I>really inside</I> the database.<br>
Now lets look at some other nearby pages.<br>
<br>
<B>At page no 12, I have a "hit"! Take a look at this:</B><br>
<br>
<font face="courier" size=2 color="blue">

DBCC PAGE('sales',3,<B>12</B>,3)<br>
<br>
partial output..<br>
<br>
Blob Id: 929038336 Level: 0 MaxLinks: 501 CurLinks: 56
<br>
Child 0 at Page (3:16) Slot 0 Size: 8040 Offset: 8040
<br>
Child 1 at Page (3:17) Slot 0 Size: 8040 Offset: 16080
<br>
Child 2 at Page (3:18) Slot 0 Size: 8040 Offset: 24120
<br>
Child 3 at Page (3:19) Slot 0 Size: 8040 Offset: 32160
<br>
Child 4 at Page (3:20) Slot 0 Size: 8040 Offset: 40200
<br>
Child 5 at Page (3:21) Slot 0 Size: 8040 Offset: 48240
<br>
..<br>
(some entries omitted)<br>
..<br>	
Child 45 at Page (3:61) Slot 0 Size: 8040 Offset: 369840

<br>
Child 46 at Page (3:62) Slot 0 Size: 8040 Offset: 377880

<br>
Child 47 at Page (3:63) Slot 0 Size: 8040 Offset: 385920

<br>
Child 48 at Page (3:13) Slot 0 Size: 8040 Offset: 393960

<br>
Child 49 at Page (3:14) Slot 0 Size: 8040 Offset: 402000

<br>
Child 50 at Page (3:15) Slot 0 Size: 8040 Offset: 410040

<br>
Child 51 at Page (3:64) Slot 0 Size: 8040 Offset: 418080

<br>
Child 52 at Page (3:65) Slot 0 Size: 8040 Offset: 426120

<br>
Child 53 at Page (3:66) Slot 0 Size: 8040 Offset: 434160

<br>
Child 54 at Page (3:72) Slot 0 Size: 8040 Offset: 442200

<br>
Child 55 at Page (3:10) Slot 0 Size: 2361 Offset: 444561

<br>
<br>
<font face="arial" size=2 color="black">
<br>
You see that? It clearly shows that the pages from page 16 all the way up to page 66, are dedicated<br>
for storing the blob data (the .jpg file).<br>
Then, from the output, you can see that pages 3:13, 3:14, and 3:15 are used too, as well as page 3:72.<br>
Actually, page 10 is the "tail" of the blob (only 2361 bytes), but above it is shown clearly that<br>
pages 16-66, and pages 13-15, and pages 10 and 72, are used for the blob.<br>
<br>
Evidently, page 12 is a bit "apart" and it looks like the "directory" for the blob.<br>
Actually, it's the root-node (page) and it contains all information on where to find the blob chunks.<br>
<br>
If you would dump, for example, page 3:66 with DBCC PAGE, you would indeed see 8040 bytes of binary data of the photo.<br>
<br>

Except page 10, each page stores exactly 8040 byte chunks of that photo.<br>
Indeed, SQL Server "returned" to page 10 (after filling all the way up to page 72) to be conservative with pages (and extends).<br>
<br>
So, the last bytes of the blob are stored in page 10. But that does not matter: the directory (so to speak)<br>
in page 12, tells SQL Server exactly <B>on which pages all chuncks are located</B>, and which file offset is associated,<br>
so any application can get a perfectly rebuild picture if requested.<br>

<br>
<B>Fig 1. Simplified representation of the pages involved in our example.</B><br>
<br>
<img src="sqlserverblob2.jpg" align="centre"/>
<br>
<br>
Maybe figure 1 is helpfull in understanding our example. You see that our (small) EMPLOYEE table exists<br>
in page 8. But, we have loaded a blob, and the very first page of the blob is page 12. This is the <B>root node</B><br>
(or root page) of the blob, containing the directory (so to speak) which tells SQL Server <I>which</I> page stores <I>what</I> blob chunck.<br>
In the figure, these are the "blue" pages.<br>
<br>
Now, maybe you think the page distribution is a bit "random". It's not. We have not discussed "extents" yet,<br>
but SQL Server organizes <B>collections of 8 pages into extents</B> (so each extent consist of 8 contiguous pages).<br>
Do you notice, from the "root node", that the actual blob data <B>starts from page 16?</B>. This is also the start<br>
of the third extent in the file. So actually, it is pretty clean. Then SQL Server starts filling pages as from page 16<br>
as is neccessary. Only the last portion of the blob data, then is stored in the second extent, just done in order<br>
not to waste space. So the second extent, contains a normal regular table (in page 8), and also some pages<br>
containing blob data.<br>
<br>
Now, the file "albert.jpg" is 444,561 bytes in size.<br>
<br>
How much "space" is then "spend" in SQL Server? Above you can see the answer:<br>
<br>
(child 0 up to child 54) x 8040 + (the bytes in page 10) = 55 x 8040 + 2361 = 444,561 bytes.<br>
<br>
So, from this, you might say that there is hardly any "overhead" in storing a blob, compared to the filesystem.<br>
Not exactly. First, a page is 8192 bytes, and SQL uses 8040 bytes for storing blob data.<br>
But that's not really much overhead.<br>
<br>
The point is, that, as we will see later, that if many blobs are stored, and over time some are deleted and updated<br>
(using the regular applications), some "gaps" will arise, throughout the "extends".<br>
Before discussing this, we need to know how SQL Server organizes it's pages for various purposes.<br>
<br>
You can easily "play" this example by yourself. Just create a new database, and the EMPLOYEE table as shown above.<br>
Then, just use a ".jpg" file, like a photo or so, of say, a few hundreds of KB in size.<br>
Next, load the blob into the table (as shown above) and play around a bit with DBCC PAGE().<br>
<br>
<br>
<br>
<font face="arial" size=2 color="blue">
<h2>Chapter 2. Pages, extends, and other structures.</h2>
<font face="arial" size=2 color="black">

<h3>2.1 Structure of a data page, and special pages.</h2>

A page is a sort of "atomic" structure in a SQL Server database file (except of the Transaction Log files).<br>
Below, you see a very schematic representation of a Data page, like used with tables.<br>
<br>
<B>Fig 2. Simplified representation of a data page.</B><br>
<br>
<img src="sqlserverblob1.jpg" align="centre"/>
<br>
<br>
The page header identifies the page, as to which "object id" it belongs, and some further housekeeping info.<br>
At the end of the page, is the "row offset table". It says, per row, the distance in bytes of those rows,<br>
from the very start of the page. So, the start of any row can be found.<br>
<br>
Now, in the figure, you see three example rows, and below that, there exists "free space".<br>
If there is room for new rows, they will simply be added. Now, if at a certain moment, a new row does not "fit"<br>
anymore, a "page split" will occur, and a whole new page will be allocated for this object, and the row<br>
will be stored in that newly allocated page instead.<br>
<br>
Note: sometimes the term "page split" is reserved for a situation where a page, which was already "quite full",<br>
and then some record just happened to be updated with data larger than the former data, resulting in the fact that<br>
the records don't "fit" anymore in that page. In this case, SQL Server allocates a new page for the object, and<br>
moves record(s) as neccessary.<br>
<br>
<br>
So, if you would just have "heaps" of pages, together forming tables, it would be simple indeed.<br>
But this is not how it is organized. We will se that in section 2.2 <br>
<br>
What <I>types</I> of pages do we have? The most important types are:<br>
<br>
<B>=> System pages: only in the first extent, and a small number distributed throughout the file.</B><br>
<br>
<TABLE border=1 BGCOLOR=#F4FA58>
 
<TR>
 <TD><font face="arial" size=2><B>PSF: Page Free Space page</B></TD>
 <TD><font face="arial" size=2>Information about page allocation and free space available on pages.</TD>
</TR>
<TR>
 <TD><font face="arial" size=2><B>GAM: Global Allocation Map and SGAM pages</B></TD>
 <TD><font face="arial" size=2>Information about whether extents are allocated.</TD>
</TR>
<TR>
 <TD><font face="arial" size=2><B>IAM: Index Allocation Map</B></TD>
 <TD><font face="arial" size=2>Information about extents used by a clustered table or index per allocation unit.</TD>
</TR>
</TABLE>
<br>
<B>=> Special pages: A small number distributed throughout the file.</B><br>
<br>
<TABLE border=1 BGCOLOR=#F4FA58>
 
<TR>
 <TD><font face="arial" size=2><B>Bulk Changed Map pages</B></TD>
 <TD><font face="arial" size=2>Information about extents modified by bulk operations<br>
                              since the last BACKUP or BACKUP LOG statement per allocation unit.</TD>
</TR>
<TR>
 <TD><font face="arial" size=2><B>Differential Changed Map pages</B></TD>
 <TD><font face="arial" size=2>Information about extents that have changed since the last BACKUP DATABASE<br>
                               or BACKUP DATABASE with Differential statement per allocation unit.</TD>
</TR>
</TABLE>
<br>
<B>=> Data pages: these are the common pages in the database file.</B><br>
<br>
<TABLE border=1 BGCOLOR=#F4FA58>
 
<TR>
 <TD><font face="arial" size=2><B>Data (table) page</B></TD>
 <TD><font face="arial" size=2>Used for "normal" tables, with a few exceptions for certain column datatypes like:<br>
                               text/ntext, image, varbinary(max) and a few others.</TD>
</TR>
<TR>
 <TD><font face="arial" size=2><B>Index page</B></TD>
 <TD><font face="arial" size=2>Almost "the same" as a data page, except for a few things like pointers.</TD>
</TR>
<TR>
 <TD><font face="arial" size=2><B>text/image page</B></TD>
 <TD><font face="arial" size=2>Used for text datatypes, or BLOBs</TD>
</TR>
</TABLE>
<br>
So, in general, the "data" and "index" pages are ofcourse <B>the most common pages</B> in a database, unless you have<br>
stored a lot of BLOBs as well.<br>
<br>
About the "system" and "special" pages:<br>
<br>
You know, this is just how Microsoft has implemented the physical structure. Ofcourse, a lot of new terms<br>
are introduced, which we really have to discuss first.<br>
<br>
As show in figure 1, the first pages in any database file, are <B>system related</B>. So, in the first 8 pages (0-7),<br>
you will never find any of your objects (like regular tables, indexes etc...).<br>
<br>
Let's print the second page of the "c:\mssql\data\SALES_DATA_01.ndf" file (this is the file with our table and blob).<br>
This page (page 1) is the "PSF", or "Page Free Space" page.<br>
<br>
You can follow along (if you indeed created the database as shown in chapter 1). Just use the DBCC PAGE command again.<br>
<br>


<font face="courier" size=2 color="blue">

Allocation Status<br>
<br>
GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED   0_PCT_FULL<br>
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED <br>          
<br>
PFS: Page Alloc Status  @0x000000000C95A000<br>
<br>
(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:8)..............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:9)..............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:10).............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:11).............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:16)...- (3:63)..=.....ALLOCATED.100_PCT_FULL<br>                              
(3:64)...- (3:66)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:67)...- (3:71)..= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:72).............=.....ALLOCATED.100_PCT_FULL <br>                             
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL  <br>   
<br>
<font face="arial" size=2 color="black">
<br>
Here you see the allocation status from page 0 up to page 5119 (we only have used up to page 72).<br>
This low end number of the last page, comes from the fact that we created the databasefiles<br>
with an initial size of 40MB (which is very very small).<br>
<br>
<font face="courier" size=2 color="blue">

- The system pages are:<br>
<br>
page 3:0 The fileheader<br>
page 3.1 The PFS<br>
page 3:2 The GAM page<br>
page 3:3 The SGAM page<br>
page 3:4 and 3:5 are not allocated<br>
page 3:6 The DIFF page (related to register extent changes between backups)<br>
page 3:7 The "Minimally Logged Map (ML Map)" page (related to register extent changes with respect to BULK LOGGED operations)<br>
<br>
- Then, the "second" extent starts, which can hold user objects.<br>
<br>
page 3:8 This happens to be a page of the EMPLOYEE table<br>
page 3:9 An IAM page (Index Allocation Map)<br>
page 3:10 In our case, it happens to contain BLOB data (from albert's photo)<br>
page 3:11 An IAM page (Index Allocation Map)<br>
page 3:12 In our case, it happens to hold the root node page of the BLOB<br>
page 3:13,14,15 blob data.<br>
<br>
- Then, as of page 3:16, the third extent starts.
Here we find pages with blob data.<br>
etc..<br>
<br>
other pages up to page 72: blob chunks.<br>
pages (3:73) up to (3:5119): free pages.<br>
<br>
<font face="arial" size=2 color="black">
 

Ok, let's first discuss a few facts about extents.<br>
<br>
<br>
<h3>2.2 Extents.</h2>

As we already saw in the former section, SQL Server organize pages in units called "extents".<br>
Each extent consists of 8 contiguous pages.<br>
<br>
Some other nummeric facts:<br>
<ul>
<li>Since a page is 8K (8192 bytes), an extent is 64K (65536 bytes) in size.</li>
<li>4GB (4294967296 bytes) space in a database file, can contain 64K (65536) extents</li>
</ul>

Two main type of extents exists:<br>

<ul>
<li>Uniform (or dedicated) extent: all pages belong to the same object (like an index).</li>
<li>Mixed (or shared) extent: the pages can belong to two or more objects.</li>
</ul>

<B>Fig 3. Uniform and Mixed extents.</B><br>
<br>
<img src="sqlserverblob3.jpg" align="centre"/>
<br>
<br>
Usually,SQL Server allocates multiple uniform extents for each large table.<br>
However, if a table is small, or "begins" small, SQL Server won't allocate an entire extent for it.<br>
Instead it will allocate one or more data pages from a mixed extent. So, a mixed extent can be thought of<br>
as a pool of pages for small objects.<br>
<br> 
When there are quite a few of small tables and indexes in your database, you might expect a certain<br>
amount of mixed extents. SQL Server ofcourse tries to save and compact space as optimal as possible.<br>
However, quite some smart algolrithms are in use. If an object gets larger than 8 pages, SQL Server tries<br>
to allocate uniform extents to that object, further on, as much as possible.<br>
<br>
Also, when you create a new clustered index, or rebuild one, the pages will go on uniform extents as well.<br>
Indexes will be discussed in another section.<br>
<br>
How SQL Server "keeps track" of free and occupied extents, will be discussed in the next section.<br>


<br>
<br>
<h3>2.3 System pages.</h2>

The most important "system" pages (for internal administration) are located on the first 8 pages<br>
of any database file. However, as you will read below, most of them are <B>repeated at certain intervals</B>.<br>
<br>
<B>Fig 4. System pages.</B><br>
<br>
<img src="sqlserverblob4.jpg" align="centre"/>
<br>
<br>
<br>
<B><U>The "GAM" and "SGAM" (Global Allocation Map) pages:</U></B><br>
<br>
A GAM page registers which extents are totally free, or have been allocated.<br>
Each GAM page covers about 64,000 extents, or about 4 GB of data.<br>
<br>
<B>Explanation:</B><br>
<br> 
The page has 8192 bytes. Now, the usual <B>page header</B> and <B>GAM header</B> will take some space,<br>
so let's say that 8000 bytes can be used for tracking extents. Now, if a <B>bitmap</B> is used, something like<br>
8000 x 8 bits can be used, so about 64K bits. Each of such a bit, can be used to identify if an extent is totally free,<br>
or if it is already partly allocated (partly or fully used).<br>
<br>
<ul>
<li>If the bit is 1, the extent is totally free.</li>
<li>If the bit is 0, the extent is (partly) allocated.</li>
</ul>

So, 64K extents can be "covered" by one GAM page.  So, this amounts to about 4GB dataspace.<br>
So, if a datafile is larger than 4GB, at every 4GB interval a GAM bitmap page is needed.<br>
<br>
A similar story holds for the SGAM page. Only here, it tracks the following in the bitmap:<br>
If an extent is a mixed extent with at least one page free, the bit is 1.<br>
If an extent is not a mixed extent, or it is a full mixed extent, then the bit is 0.<br>
<br>
So, this explains how SQL Server can discriminate between free or (partially) used extents.<br>
<br>
As you have seen in the former sections, the first GAM is page 2, and the first SGAM is page 3 in any .ndf file.<br>
<br>

<B><U>The "Page Free Space" (PFS) pages:</U></B><br>
<br>
This is page 1 in any ordinary .ndf database file, right after the fileheader (page 0).<br>
It registers which pages and page ranges are in use, or are free.<br>
If you have very small database files, then even just one PFS page might be sufficient per file.<br>
This will be explained below.<br>
In our example sales database, we use 40MB sizes, which is ridiculous small ofcourse.<br>
<br>
But for larger database files, a PFS page needs to be repeated after about 8000 pages.<br>
This is so, because a PFS does not use a bitmap. The PFS uses one byte for each page, which records whether the page<br>
is allocated or not. So, since the PFS has about 8000 usable bytes for this purpose, other PFS pages are needed in (about)<br>
8000 page intervals.<br>
It needs a byte per page, because it tries to describe for each page, the level of "fullness", like<br>
0_PCT_FULL, 50_PCT_FULL, 100_PCT_FULL (and a few others), so to register <B>that</B>, one bit per page<br>
is not sufficient. So, one byte per page is used.<br>
<br>
Here again, you can see a dump of the PFS of the "c:\mssql\data\SALES_DATA_01.ndf" database file,<br>
as used in our example SALES database.<br>
<br>

<font face="courier" size=2 color="blue">

Allocation Status<br>
<br>
GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED   0_PCT_FULL<br>
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED <br>          
<br>
PFS: Page Alloc Status  @0x000000000C95A000<br>
<br>
(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:8)..............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:9)..............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:10).............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:11).............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:16)...- (3:63)..=.....ALLOCATED.100_PCT_FULL<br>                              
(3:64)...- (3:66)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:67)...- (3:71)..= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:72).............=.....ALLOCATED.100_PCT_FULL <br>                             
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL  <br>   
<br>
<font face="arial" size=2 color="black">

<B><U>The "ML" (or Bulk Changed Map pages) and "DIFF" (Differential Changed Map pages):</U></B><br>
<br>
=> The Differential Changed Map pages, track which extents have been changed between differential backups.<br>
Ever wondered how SQL Server knows what changes to backup between a Full backup, and the following<br>
differential backups? The differential backups are generally much smaller compared to the full backup.<br>
This is due to the fact that SQL Server registers which extents have been changed. So, unmodified extents<br>
do not need to be backupped between differential backups.<br>
<br>
=> The ML pages track which extents are affected with "Bulk logged" operations.<br>
<br>
However, both types of pages are not very relevant for our discussion.<br>
<br>

<B><U>The "IAM" pages:</U></B><br>
<br>
I like to pospone this one, until we have covered Btree and index structures in some more detail.<br>
<br>
<br>
<br>
<font face="arial" size=2 color="blue">
<h2>Chapter 3. Using FILESTREAM for storage of blobs (2008/2012).</h2>
<font face="arial" size=2 color="black">

Up to now, we have seen <B>one type</B> of BLOB storage, namely <B>"inline"</B> storage, where the blob<br>
really physically is stored on database pages.<br>
<br>
If you would ask me: I think the (traditional) inline storage is quite "ok", since all information<br>
(metadata+blobs themselves) is collected in one and the same database. For transactional reasons, and<br>
for availability, this is good.<br>
<br>
<B>&#8658; Drawbacks of inline storage:</B><br>
<br>
However, if the amount of blobs is very large, DBA's might be confronted with long backup/recovery times.<br>
But that could also be due to the fact that certain appliances have "versioning" switched on, which might result<br>
in the fact that documents (blobs) are stored several times, corresponding to their "versions".<br>
So, if you could throttle that back a little, it might have quite an impact on the database size.<br>
<br>
Secondly, performance might be an issue too. But don't forget that possible "middleware/application" Servers<br>
might be involved as well.<br>
<br>
A commonly heard phrase is that:<I>"for large blobs", the filesystem has better perfomance over inline storage,<br>
while "for smaller blobs", inline storage offers good performance</I>. Now, define "large" and "small"....<br>
It seems that Microsoft takes "over" 1MB as "large", and smaller than 1MB as "small".<br>
<br>
The suggested improvement in performance, is partly attributed to the fast file IO service of the OS, and<br>
the use of the "NT file cache".<br>
<br>
<B>&#8658; Alternatives for inline storage:</B><br>
<br>
It's possible to store the blobs on a filesystem (using block IO, as well as file IO).<br>
This means that SQL Server uses tables and views solely for metadata, but the blobs themselves<br>
are not stored in SQL Server, but are files on disks.<br>
<br>
There exists third-party solutions for this, as well as Microsoft implementations.
<br>
Since SQL 2008, the Microsoft "FILESTREAM" feature became available.<br>
In this chapter, we are going to take a quick look at it's main features.<br>
<br>
<B>&#8658; What is the FILESTREAM feature then?:</B><br>
<br>
In essence: the DBA needs to create a new "filegroup" with the "filestream clause". This filegroup,<br>
then physically is <B>a folder on a fileystem</B>, where the blobs are going to be saved and accessed.<br>
From a "transactional viewpoint", the filegroup is just a container, accessible using the SQL Server interface<br>
which then garantees consistency. This is OK, but if IO is possible using other methods (using the OS for example),<br>
there might be serious risks for inconsistencies.<br>
<br>
Once filestream is enabled and a filestream filegroup is created, <B>you can build tables</B> with a varbinary datatype, using<br>
the <B>"filestream clause" for that column</B>, which makes sure the blobs then are saved to, and accessed from, this special filegroup.<br>
<br>
You can "enable" filestream "globally" on the instance level. Per default, it's "off".<br>
However, filestream is a database feature. You can have databases under your instance <I>with no</I> filestream, <br>
and databases <I>with</I> filestream.<br>
<br>
<br>
<h3>3.1 Enabling "Filestream".</h2>

<font face="arial" size=2 color="brown">
<B>=> While installing SQL Server:</B><br>
<font face="arial" size=2 color="black">
<br>
When you install SQL Server, somewhere "halfway", there is an option to configure the database engine.<br>
You have to watch carefully, if you would do an interactive install, because it's easy to overlook this option.<br>
See figure 5.<br>
<br>
<B>Fig 5. Enabling the FILESTREAM feature at the installation of SQL Server.</B><br>
<br>
<img src="sqlserverblob6.jpg" align="centre"/>
<br>
<br>
<font face="arial" size=2 color="brown">
<B>=> After SQL Server already was installed:</B><br>
<font face="arial" size=2 color="black">
<br>
<B>- Using a graphical utility:</B><br>
<br>
If SQL Server was already installed, you can still use the "SQL Server Configuration Manager"<br>
to enable the filestream feature. Start the utility, rightclick the "SQL Server service", and choose "properties".<br>
In the dialogbox that will show up, you can choose for several settings.<br>
<br>
<B>Fig 6. Enabling the FILESTREAM feature using the "SQL Server Configuration Manager".</B><br>
<br>
<img src="sqlserverblob5.jpg" align="centre"/>
<br>
<br>
<B>- Using TSQL:</B><br>
<br>
Instead of using the graphical utility, you may use TSQL as well, to enable the Filestream feature:<br>
<br>
<font face="courier" size=2 color="black">

USE master<br>
Go<br>
EXEC sp_configure 'show advanced options'<br>
GO<br>
EXEC sp_configure filestream_access_level, 1<br>
GO<br>
RECONFIGURE WITH OVERRIDE<br>
GO <br>
<br>
-- the possible options 0 (none), or 1 (TSQL access), or 2 (TSQL access, and file I/O streaming access)<br>
-- will be explained below.<br>
<font face="arial" size=2 color="black">
<br>
<br>
<font face="arial" size=2 color="brown">
<B>=> Configuring the Filestream settings:</B><br>
<font face="arial" size=2 color="black">
<br>
Note from figures 5 and 6, that you can configure Filestream for various settings.<br>
Although those figures show the same possible configurations, it's most clear from figure 6.<br>

<ul>
<li><B>Option 1: "Enable FILESTREAM for Transact-SQL access"</B><br>
Here you limit access to the blobs using TSQL only. This is the safest way to go, albeit (seemingly)<br>
not the most flexible option.</li>

<li><B>Option 2: "Enable FILESTREAM for file I/O streaming access"</B><br>
If you want this too, then TSQL access is enabled, <B>and</B> "file i/o access" is enabled.<br>
This is very flexible, but you need to do some further research here.</li>

<li><B>Option 3: "Allow remote clients to have streaming access to FILESTREAM data"</B><br>
If you want this, then TSQL access is enabled, <B>and</B> "share access" is enabled.<br>
Furthermore any client can, in principle, access the share. You really need to do some further research here.</li>
</ul>
<br>
From a "transactional viewpoint", the filegroup is just a container, accessible using the SQL Server interface<br>
which then garantees consistency, if you would use the first and second options.<br>
In this case, you can use TSQL, and also use Win32 APIs to work with the blobs. For example, the "columnname.pathname()"<br>
method (columnname of the varbinary column), can provide a handle to a file, and further operations can take place.<br>
One important consequence is thus that Applications can use streaming APIs and performance of the file system<br>
and at the same time maintain transactional consistency between the unstructured data (the files) and the <br>
corresponding structured data, that is, the other fields of the table, and all optionally related tables.<br>

<br>
If you enable the third option, so using a share access for remote clients, I would say that a fully garanteed<br>
consistency is "at risk", unless the applications in use are truly "ironclad". You need to do more research (if you are interrested).<br>
So, I guess I try to say that there <I>might</I> be security issues as well as transactional consistency issues.<br>
<br>
<br>
<h3>3.2 Implementing "Filestream".</h2>

Lets try to add a filestream tablespace to our SALES database.<br>
I suggest we make a suitable folder first. I choose to create a folder <B>"c:\fsblobs"</B>.
<br>
Inside that, I want to have a <B>"c:\fsblobs\documents"</B> folder. But do not create this one.<br>
That then will be the folder/container that's associated with the new filestream tablespace.<br>
<br>
Create the "c:\fsblobs", but do not create the second folder "c:\fsblobs\documents", because SQL<br>
will do that for us (it does not expect an <I>existing</I> directory).<br>
<br>
So, here we go:<br>
<br>
<font face="courier" size=2 color="blue">

ALTER DATABASE SALES ADD FILEGROUP Documents CONTAINS FILESTREAM<br>
GO<br>
ALTER DATABASE SALES ADD FILE (NAME='Documents', FILENAME='c:\fsblobs\documents') TO FILEGROUP Documents<br>
GO<br>
<br>
<font face="arial" size=2 color="black">
In my case, it succeeded. Let's see what happened on the filesystem:<br>
<br>
<font face="courier" size=2 color="blue">

C:\> cd fsblobs<br>
C:\fsblobs> cd d*<br>
<br>
C:\fsblobs\documents>dir<br>
<br>
30.11.2012  20:14    DIR          $FSLOG<br>
30.11.2012  20:14               422 filestream.hdr<br>
<br>
C:\fsblobs\documents>cd $*<br>
<br>
C:\fsblobs\documents\$FSLOG>dir<br>

30.11.2012  20:14    DIR          .<br>
30.11.2012  20:14    DIR         ..<br>

<br>
<font face="arial" size=2 color="black">
Ofcourse, the folder is <B>still empty</B>. We have not stored anything yet "in" the filestream filegroup.<br>
<br>
Note the "$FSLOG" directory. This is a sort of a "transaction log" (container) for events on blobs that are going<br>
to be stored in the filestream container.<br>
<br>
As we will see later on, anytime you create a table using filestream for blob storage, we will see a subfolder added<br>
in the form of "C:\fsblobs\documents\guid" like for example "C:\fsblobs\documents\25892e17-80f6-415f-9c65-7395632f0223."<br>
Then, inside such a GUID named container, we will find the blobs associated with that table. Later more on this.<br>
<br>
Note:<br>
The "$FSLOG" directory is (ofcourse) very sensitive to "foreign" files. If, on a test system, you place any object there<br>
it will have an effect to the status of the database of which this filestream container is associated with.<br>
You might even end up with a Suspect database. It's easy to recover from this, ofcourse.<br>
But it's an illustration that any interactive access to filestream containers (e.g. using OS commands) must be avoided.<br>
<br>
<br>
Let's now see what SQL Server "thinks" what we have as database files:<br>
<br>
<font face="courier" size=2 color="blue">

SELECT file_id, type_desc, name, physical_name FROM sys.database_files<br>
<br>
file_id...type_desc.....name............physical_name<br>
1.........ROWS..........SALES...........c:\mssql\data\SALES.mdf<br>
2.........LOG...........SALES_LOG_001...c:\mssql\data\SALES_LOG_001.ldf<br>
3.........ROWS..........SALES_DATA_01...c:\mssql\data\SALES_DATA_01.ndf<br>
4.........ROWS..........SALES_INDEX_01..c:\mssql\data\SALES_INDEX_01.ndf<br>
65537.....FILESTREAM....Documents.......c:\fsblobs\documents<br>
<br>
<font face="arial" size=2 color="black">
So, the regular database files are always of "type" ROWS, or LOG (in case of transactionlog files).<br>
Indeed, we now have a <B>new</B> type of file, of type FILESTREAM, associated with the physical location "c:\fsblobs\documents".<br>
<br>
<br>
<B><U>What storage can be used:</U></B><br>
<br>
The disk(s) that hold the filestream containers does not need to be local disks.<br>
They can easily be LUNs from a SAN as well.<br>
However, all filesystems should be NTFS formatted.<br>
<br>
There are many other considerations, especially for obtaining the best performance.<br>
For example, cluster (block) size can be important, as well as the RAID level, to name a few.<br>



<br>
<br>

<h3>3.3 Adding a blob.</h2>

Let's create a DOCS table, with some regular datatypes, and ofcourse a datatype of datatype varbinary(max).<br>
<br>

<font face="courier" size=2 color="blue">

CREATE TABLE DOCS<br>
(<br>
&nbsp <B>FileStreamID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID()</B>,<br>
&nbsp DOC_EXTENSION VARCHAR(10),<br>
&nbsp DOC_NAME VARCHAR(256),<br>
&nbsp <B>DOCUMENT VARBINARY(MAX) FILESTREAM</B><br>
) FILESTREAM_ON Documents<br>
GO<br>
<br>
<font face="arial" size=2 color="black">

Let's take a look at the columns I took here.<br>
<br>
=> The "DOC_EXTENSION" and "DOC_NAME" are simply optional. They are just there for informational purposes.If you want,<br>
we could have left those out. It's only that it might be handy to have the "document name" (blobname), and<br>
the file extension (like. jpg or .xls) registered in the table as well.<br>
<br>
=> The name "FileStreamID" is not a required columnname. It could just as well have been named "id" or another reasonable name.<br>
But we must have such a field, and it <B>must be</B> of a "uniqueidentifier" datatype.<br>
This is a 16-byte binary value that should actually function as a "World Wide globally unique identifier" (GUID).<br>
<br>
As we will see, it is used to <I>uniquely determine</I> the file associated with that record, on the filesystem.<br>
Often, we use the NEWID() function, or NEWSEQUENTIALID() function, to let SQL Server itself automatically generate GUIDs<br>
for any new record. Note that we have used "DEFAULT NEWSEQUENTIALID()" as a default, so SQL Server will handle it<br>
automatically, if we (or an application) do not provide a GUID for a new record.<br>
<br>
As "UNIQUEIDENTIFIER" should result in unique identifiers anyway, you might wonder what the "ROWGUIDCOL" is doing here.<br>
I believe it is not absolutely neccessary, but the efficiency gets up by using UNIQUEIDENTIFIER with the ROWGUIDCOL property.<br>
 
<br>
=> Lastly, we have a field (DOCUMENT) which is of datatype "varbinary", and this one refers to our blob.<br>
Note that in this column declaration, the clause "FILESTREAM" is neccessary to inform SQL Server that<br>
we are going to use the FILESTREAM feature for storage of blobs.<br>
<br>
<br>
Suppose that we have the file (or the "blob") c:\temp\sales.xls. We are going to store this as a blob in our<br>
"Documents" filestream filegroup (which actually is the "c:\fsblobs\documents" container).<br>
<br>
Take a look at the following TSQL:<br>
<br>
<font face="courier" size=2 color="blue">

INSERT INTO DOCS (DOC_EXTENSION, DOC_NAME, Document)<br>
SELECT<br>
 'xls' AS DOC_EXTENSION,<br>
 'sales.xls' AS DOC_NAME,<br>
 * FROM OPENROWSET(BULK 'c:\temp\sales.xls', SINGLE_BLOB)  AS Document<br>
GO<br>
<br>

<font face="arial" size=2 color="black">

Now, let's see what record we have in the table DOCS:<br>
<br>
<font face="courier" size=2 color="blue">

SELECT * FROM DOCS<br>
<br>
FileStreamID............................DOC_EXTENSION....DOC_NAME.........DOCUMENT<br>
6423DA82-5523-E211-B687-000AE4B3F060... xls..............sales.xls........0xD0CF11E0A1B11 (etc)<br>
<br>

<font face="arial" size=2 color="black">

<br>
<br>

<h3>3.4 Analysis of blob storage.</h2>

<font face="arial" size=2 color="red">
<B>&#8658; Let's take a look at the database pages first:</B><br>
<font face="arial" size=2 color="black">
<br>

The "sales.xls" file, I loaded "into" the DOCS table (in reality in the filestream tablespace),<br>
is 8921 KB (circa 9MB) in size, which qualifies as a large BLOB.<br>
<br>
In the chapters above, we have seen which pages in the SALES database were allocated, after (only) loading<br>
an employee photo (albert.jpg). You know how to do that using the DBCC PAGE statement.<br>
We know that after page 3:72, all pages were "free". Here is a partial output again:<br>
<br>
<B>- Situation before filestream and before loading "sales.xls":</B><br>
<br>
<font face="courier" size=2 color="blue">

Allocation Status<br>
<br>
GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED   0_PCT_FULL<br>
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED <br>          
<br>
PFS: Page Alloc Status  @0x000000000C95A000<br>
<br>
(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:8)..............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:9)..............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:10).............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:11).............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:16)...- (3:63)..=.....ALLOCATED.100_PCT_FULL<br>                              
(3:64)...- (3:66)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:67)...- (3:71)..= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:72).............=.....ALLOCATED.100_PCT_FULL <br>                             
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL  <br>   
<br>
<font face="arial" size=2 color="black">

Now, we have "loaded" are large blob, <I> but it should NOT have been loaded</I> into database pages.<br>
The blob "sales.xls" is supposed to live on the filesystem. So let's dump the PFS page again:<br>

<br>
<B>- Situation after enabling filestream and after loading "sales.xls":</B><br>

<font face="courier" size=2 color="blue">

<br>
(Allocation Status<br>
<br>
GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED   0_PCT_FULL<br>
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED <br>          
<br>
PFS: Page Alloc Status  @0x000000000C95A000<br>
<br>
(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL <br>                             
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL  <br>                            
(3:8)..............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:9)..............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:10).............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:11).............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL                     Mixed Ext<br>
(3:16)...- (3:63)..=.....ALLOCATED 100_PCT_FULL   <br>                           
(3:64)...- (3:66)..=.....ALLOCATED 100_PCT_FULL                     Mixed Ext<br>
(3:67).............=.....ALLOCATED..50_PCT_FULL                     Mixed Ext<br>
(3:68).............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:69).............=.....ALLOCATED...0_PCT_FULL                     Mixed Ext<br>
(3:70).............=.....ALLOCATED...0_PCT_FULL           IAM Page  Mixed Ext<br>
(3:71).............= NOT ALLOCATED...0_PCT_FULL<br>                              
(3:72).............=.....ALLOCATED.100_PCT_FULL <br>                             
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL  <br>   
<br>
<font face="arial" size=2 color="black">

From this, it is easy to conclude that our sales.xls blob (8MB in size), is not stored inside the database.<br>
If it would, then about 1000 database pages would have been allocated, which is not the case.<br>
You can see from the output, that the range of pages (3:73) - (3:5119) are still free, as they were before.<br>
<br>
So, the blob is not in the database. Now let's take a look how it is organized in "c:\fsblobs\documents",<br>
which is the container of our filestream filegroup.<br>
<br>                            
Note: there are just a few changes though, like the storage of the DOCS table in page 3:67, and<br>
the two new IAM pages. But there are no new pages due to blob storage. We will come back on the IAM later on.<br>
<br>
<br>
<font face="arial" size=2 color="red">
<B>&#8658; Let's take a look at the filesystem:</B><br>
<font face="arial" size=2 color="black">
<br>
On a Test system, you could browse around through the subdirs within the "c:\fsblobs\documents" directory.<br>
Don't do that on anything labeled "production".<br>
<br>
In the figure below, you see how the storage is organized.<br>
<br>
<B>Fig 7. Folder structure of the FILESTREAM container/filegroup after storing one blob.</B><br>
<br>
<img src="sqlserverblob7.jpg" align="centre"/>
<br>
<br> 
Here you see my "sales.xls" blob, with exactly the same size as the original file.<br>
Note the different levels of the subdirectories, each named as a GUID identifier.<br>
<br>
The "upper" GUID, represent the DOCS table.<br>
The "lower" GUID, represent the "DOCUMENT" column of the DOCS table.<br>
<br>
Let's see what happens if we load a second blob "into" the DOCS table.<br>
This time, we use the "sales2.xls" file, with a size of 5283KB, stored in "c:\temp".<br>
To load in into the filestream container, we can use:<br>
<br>
<font face="courier" size=2 color="blue">

INSERT INTO DOCS (DOC_EXTENSION, DOC_NAME, Document)<br>
SELECT<br>
 'xls' AS DOC_EXTENSION,<br>
 'sales2.xls' AS DOC_NAME,<br>
 * FROM OPENROWSET(BULK 'c:\temp\sales2.xls', SINGLE_BLOB)  AS Document<br>
GO<br>
<br>
<font face="arial" size=2 color="black">
<br>
<B>Fig 8. After storing the second blob.</B><br>
<br>
<img src="sqlserverblob8.jpg" align="centre"/>
<br>
<br>
<br> 
<B><U>TSQL and Win32 API:</U></B><br>
<br>
<B>- TSQL:</B><br>
<br>
Using TSQL, you have full control on columns with blob data. You can use INSERT, UPDATE, DELETE<br>
on tables, in the "usual" way. However, if you would delete a row with binary data,<br>
you probably will not see that the object is removed from the filestream folder immediately.<br>
First, the object is "tombstoned" and a sort of "garbage collector" will remove it permanently later.<br>
Sometimes, it is observed that this can take quite a while.<br>
<br>
There are some "best practices" in using TSQL in relation to tables with blobs.<br>
See chapter 4.<br>
<br>
<B>- Programmatic API's to Filestream:</B><br>
<br>
Using C#, VB#, C++ etc.., you can access the blobs "in a neat way" using SQL Server<br>
services. This means that manipulating blobs, using the filestream file API is possible.<br>
This also means that authentication to SQL Server have occurred.<br>
<br>
Usually, a handle to the object is aquired. Such a handle can be derived from a sort UNC path<br>
to the object. Take a look at this example. Although it's SQL, it shows how a handle<br>
can be obtained. Similar code can be placed in other developing environments.<br>
<br>
<font face="courier" size=2 color="blue">

DECLARE @uncpath varchar(max)<br>
<br>
SELECT @uncpath = Document.PathName()<br>
FROM DOCS<br>
WHERE DOC_NAME = 'sales.xls'<br>
<br>
PRINT @uncpath<br>
<br>
\\W2K8srv1\MSSQLSERVER\v1\SALES\dbo\DOCS\DOCUMENT\FB30ECE5-A33C-E211-8FDC-F04DA2915E69<br> 

<font face="arial" size=2 color="black">
<br>
<br> 
<B><U>Some Further remarks:</U></B><br>
<br>
Up to now, we have seen two types of blob "storage":<br>
<br>
- "Inline", that is, the blobs are really stored inside the database, in database pages.<br>
- "Filestream", where blobs are stored in the filesystem.<br>
<br>
Another option is the SQL 2012 "File Tables" option, which is actually a nifty refinement of the Filestream feature.<br>
<br>
Other options to store blobs on the filesystem exists as well, like RBS of Microsoft, or third-party proprierty solutions<br>
which sometimes can be observed in certain document workflow appliances.<br>
<br>
If you would be in a situation <I>to select a storage model</I>, then ultimately the <B>Application</B> that will<br>
be used to accesss the objects, should be <B>"primary"</B> in your decision.<br>
Anyway, if you are in such a situation, you got a lot of research to do.<br>
<br>
However, in many documents and blogs you will find the "general" advice to store objects over 1MB<br>
on the filesystem, and smaller objects inline.<br>
<br>
Ofcourse, what a certain "Albert" says is not very relevant, but my two cents are:<br>
<br>
I was never very dissapointed with inline storage, with respect to general performance,<br>
and throughput of backup/recovery. So, if the database is to be expected not to grow very large,<br>
the inline option stays appealing to me.<br>
Furthermore, I was never too keen about a "split" situation where one part exists on the filesystem<br>
and other parts inside the database.<br>
<br>
But, an application can favour one model over the other, so it's probably best to follow the applications favourite.<br>
<br>
However, many well-known SQL authorities advise this:<br>
Use the filesystem for large objects, and inline for smaller blobs.<br>  
<br>
<br>
<br>
<font face="arial" size=2 color="blue">
<h2>Chapter 4. Some methods for storage and retrieval of blobs.</h2>
<font face="arial" size=2 color="black">


<br>

<br>

<br>
<br>
<br>
<br>
<br>


</body>
</html>