<html> <head> <title>Albert van der Sel : Micro introduction NetApp</title> </head> <body bgcolor="#FFFFFF" link="blue" alink="blue" vlink="blue"> <h1>A microscopic small note on Netapp.</h1> <B>Version</B> : 0.1<br> <B>Date</B> : 25/12/2012<br> <B>By</B> : Albert van der Sel<br> <hr/> <font face="arial" size=2 color="black"> <br> <br> <font face="arial" size=2 color="blue"> <h2 id="section8">1. A Microscopic small note on Netapp.</h2> <font face="arial" size=2 color="black"> <font face="arial" size=2 color="red"> <h3>1.1 A quick overview.</h3> <font face="arial" size=2 color="black"> NetApp is the name of a company, delivering a range of small to large popular SAN solutions.<br> <br> It's not really possible to "capture" the solution in just a few pages. People go to trainings for a good reason:<br> the product is very wide, and technically complex. To implement an optimal configured SAN, is a real challenge.<br> So, this note does not even scratch the surface, I am afraid. However, to get a high-level impression, it should be OK.<br> <br> Essentially, a high-level description of "NetApp" is like this:<br> <ul> <li>A controller, called the <B>"Filer"</B> or <B>"FAS"</B> (NetApp Fabric-Attached Storage), functions as the managing device for the SAN.</li> <li>The Filer runs the "Ontap" Operating System, a unix-like system, which has it's root in FreeBSD.</li> <li>The Filer manages "diskarrys" which are also called "shelves".</li> <li>It uses a "unified" architecture, that is, from small to large SANs, it's the same Ontap software, with the<br> same CL and tools, and methodology.</li> <li>Many features in NetApp/Ontap must be seperately licensed, and the list of features is very impressive.</li> <li>There is a range of SNAP* methodologies which allows for very fast backups, and replication of Storage data to other another controller and its shelves,<br> and much more other stuff, not mentioned here. But we will discuss Snapshot backup Technology in section 1.4.</li> <li>The storage itself uses the WAFL filesystem, which is more than just a "filesystem". It was probably inspired by "FFS/Episode/LFS",<br> resulting in "a sort of" Filesystem with "very" extended LVM capabilities.<br> </ul> <br> Fig. 1. SAN: Very simplified view on connection of the NetApp Filer (controller) to diskshelves.<br> <br> <img src="diskdevices8.jpg" align="centre"/> <br> <br> In the "sketch" above, we see a simplified model of a NetApp SAN.<br> Here, the socalled "Filer", or the "controller" (or "FAS"), is connected to two disk shelves (disk arrays).<br> Most SANs, like NetApp, supports FCP disks, SAS disks, and (slower) SATA disks.<br> Since quite some time, NetApp favoures to put SAS disks in their shelves.<br> <br> If the Storage Admin wants, he or she can configure the system to act as a SAN and/or as a NAS, so that it can provide storage using either<br> file-based or block-based protocols.<br> <br> The picture above is extremely simple. Often, two Filers are arrangend in a clustered solution, with multiple paths<br> to multiple diskshelves. This would then be a HA solution using a "Failover" technology. <br> So, suppose "netapp1" and "netapp2" are two Filers, each controlling their own shelves. Then if netapp1 would fail for some reason,<br> the ownership of its shelves would go to the netapp2 filer.<br> <br> <br> <font face="arial" size=2 color="red"> <h3>1.2 A conceptual view on NetApp Storage.</h3> <font face="arial" size=2 color="black"> Note from figure 1, that if a shelve is on port "0a", the Ontap software identifies individual disks by the portnumber and the disk's SCSI ID, like for example "0a.10", "0a.11", "0a.12" etc..<br> <br> <br> This sort of identifiers are used in many Ontap prompt (CL) commands.<br> <br> But first it's very important to get a notion on how NetApp organizes it's storage. Here we will show a very high-level<br> conceptual model.<br> <br> Fig. 2. NetApp's organization of Storage.<br> <br> <img src="diskdevices12.jpg" align="centre"/> <br> <br> The most fundamental level is the <B>"Raid Group" (RG)</B>. NetApp uses "RAID4", or "RAID6 with double parity (DP)" on two disks,<br> which is the most robust option ofcourse. It's possible to have one or more Raid Groups.<br> <br> An <B>"Aggregate"</B> is a logical entity, composed of one or more Raid Groups.<br> Once created, it fundamentally represents <I>the</I> storage unit.<br> <br> If you want, you might say that an aggregate "sort of" virtualizes the real physical implementation of RG's<br> <br> Ontap will create RG groups for you "behind the scene" when you create an aggregate. It uses certain rules for this,<br> depending on disk type, disk capacities and the number of disks choosen for the aggregate. So, you could end up with one or more RG's<br> when creating a certain aggregate.<br> <br> As an example, for a certain default setup:<br> <br> - if you would create a 16 disk aggregate, you would end up with one RG.<br> - if you would create a 32 disk aggregate, you would end up with two RG's.<br> <br> It's quite an art to get the arithmetic right. How large do you create an aggregate initially? What happens if additional spindles<br> become available later? Can you then still expand the aggregate? What is the ratio of usable space compared to what gets reserved?<br> <br> You see? When architecting these structures, you need a lot of detailed knowledge and do a large amount of planning.<br> <br> A <B>FlexVol</B> is next level of storage, "carved out" from the aggregate. The FlexVol forms the basis for "real" usable stuff, like<br> LUNs (for FC or iSCSI), or CIFS/NFS shares.<br> <br> From a FlexVol, CIFS/NFS shares or LUNs are created.<br> <br> A LUN is a logical representation of storage. As we have seen before, it "just looks" like a hard disk to the client.<br> From a NetApp perspective, it looks like a file inside a volume.<br> The true physical implementation of a LUN on the aggregate, is that it is a "stripe" over N physical disks in RAID DP.<br> <br> Why would you choose CIFS/NFS or (FC/iSCSI) LUNs? Depends on the application. If you need a large share, then the answer is obvious.<br> Also, some Hosts really need storage that acts like a local disk, and where SCSI <B>reservations</B> can be placed on (as in clustering).<br> In this case, you obviously need to create a LUN.<br> <br> Since, using NetApp tools, LUNs are sometimes represented (or showed) as "files", the entity "qtree" gets meaning too.<br> It's analogous to a folder/subdirectory. So, it's possible to "associate" LUNs with a qtree.<br> Since it have the properties that a folder has too, you can associate NTFS or Unix-like permissions to all<br> objects associated to that qtree.<br> <br> <br> <font face="arial" size=2 color="red"> <h3>1.3 A note on tools.</h3> <font face="arial" size=2 color="black"> There are a few very important <B>GUI or Webbased tools</B> for a Storage Admin, for configuring and monitoring their Filers and Storage.<br> Once "FilerView" (depreciated on Ontap 8) was great, and followup versions like "OnCommand System Manager" are probably indispensable too.<br> <br> These type of GUI tools allow for monitoring, and creating/modifying all entities as discussed in section 1.2.<br> <br> It's also possible to setup a "ssh" session through a network to the Filer, and it also has a serial "console" port for direct communication.<br> <br> <br> There is a very strong "command line" (CL) available too, which has a respectable "learning curve".<br> <br> Even if you have a very strong background in IT, nothing in handling a SAN of a specific Vendor is "easy".<br> Since, if a SAN is in full production, almost <B>all vital data</B> of your Organization is centered on the SAN, you cannot afford any mistakes.<br> To be carefull and not taking any risks, is a good quality.<br> <br> There are hundreds of commands. Some are "pure" unix shell-like, like "df" and many others. But most are specific to Ontap like "aggr create"<br> and many others to create and modify the entities as discussed in section 1.2.<br> <br> If you want to be "impressed", here are some links to "Ontap CL" references:<br> <br> <a href="http://support.netapp.com/NOW/public/knowledge/docs/ontap/rel732/pdfs/ontap/210-04499.pdf">Ontap 7.x mode CL Reference</a><br> <a href="http://contourds.com/uploads/file/Netapp_ResourceCenter2.pdf">Ontap 8.x mode CL Reference</a><br> <br> <br> <font face="arial" size=2 color="red"> <h3>1.4 A note on SNAPSHOT Backup Technology.</h3> <font face="arial" size=2 color="black"> One attractive feature of NetApps storage, is the range of SNAP technologies, like the usage of SNAPSHOT backups.<br> You can't talk about NetApp, and not dealing with this one.<br> <br> From Raid Groups, an aggregate is created. From an aggregate, FlexVols are created. From a FlexVol, a NAS (share) might be created,<br> or LUNs might be created (accesible via FCP/iSCSI).<br> <br> Now, we know that NetApp uses the WAFL "filesystem", and it has its own "overhead", which will diminish your total usable space.<br> This overhead is estimated to be about 10% per disk (not reclaimable). It's partly used for <B>WAFL metadata</B>.<br> <br> Apart from "overhead", several additional <B>"reservations"</B>are in effect.<br> <br> When an aggregate is created, per default "reserved space" is defined to hold optional future "snapshot" copies.<br> The Storage Admin has a certain degree of freedom of the size of this reserved space, but in general it is advised<br> not to set it too low. As a guideline (and default), often a value of 5% is "postulated".<br> <br> Next, it's possible to create a "snapshot reserve" for a FlexVol too.<br> Here the Storage Admin has a certain degree of freedom as well. NetApp generally seems to indicate that a snapshot<br> reserve of 20% should be applied. However, numbers seem to vary somewhat when reading various recommendations.<br> However, there is a big difference in NAS and SAN LUN based Volumes.<br> <br> Here is an example of manipulating the reserved space on the volume level, setting it to 15%, using the Ontap CL:<br> <br> <font face="courier" size=2 color="blue"> FAS1> snap reserve vol10 15 <br> <font face="arial" size=2 color="black"> <br> <br> <B><U>Snapshot Technologies:</U></B><br> <br> There are few different "Snapshot" technologies around.<br> <br> One popular implementation uses the <B>"Copy On Write"</B> technology, which is fully block based or page based. NetApp does not use that.<br> In fact, NetApp uses "a new block write", on any change, and then sort of cleverly "remebers" inode pointers.<br> <br> To understand this, lets review "Copy On Write" first, and then return to NetApp Snapshots.<br> <br> <B>&#8658; "Copy On Write" Snapshot:</B><br> <br> Fig. 3. "Copy on Write" Snapshot (not used by NetApp).<br> <br> <img src="diskdevices13.jpg" align="centre"/> <br> <br> Let's say we have a NAS volume, where a number of diskblocks are involved. "Copy on Write" is really easy to understand.<br> Just before <I>any block</I> gets modified, the <B>original</B> block gets copied to a reserved space area.<br> You see? Only the "deltas", as of a certain t=t<sub>0</sub> (when the snapshot was activated), of a Volume (or file, or whatever)<br> gets copied. This is great, but it involves multple "writes": first, write the original block to a save place, then write the<br> the block with the new data.<br> <br> In effect, you have a backup of the entity (the Volume, the file, the "whatever") as it was at t=t<sub>0</sub>.<br> <br> If, later on, at t=t<sub>1</sub>, you need to restore, or go back to t=t<sub>0</sub>, you need the primary block space, and copy the all reserved<br> (saved) blocks "over" the modified blocks.<br> Note that the reserved space does NOT contain a full backup. It's only a collection of blocks freezed at t=t<sub>0</sub>, before they<br> were modified between t=t<sub>1</sub> - t=t<sub>0</sub>.<br> Normally, the reserved space will contain much less blocks than the primary (usable, writable) space, which means a lot of saving<br> of diskspace compared to a traditional "full" copy of blocks.<br> <br> <B>&#8658; "NetApp" Snapshot copy: general description (1)</B><br> <br> You can schedule a Snapshot backup of a Volume, or you can make one interactively using an Ontap command or GUI tool.<br> So, a Netapp Snapshot backup is not an "ongoing process". You start it (or it is scheduled), then it runs until it is done.<br> <br> The mechanics of a snapshot backup are pretty "unusual", but it sure is <I>fast</I>.<br> <br> Fig. 4. NetApp Snapshot copy.<br> <br> <img src="diskdevices14.jpg" align="centre"/> <br> <br> It's better to speak of a "Snapshot copy", than of a "Snapshot backup", but most of us do not care too much about that.<br> It's an exact state of the Volume as it was at t=t<sub>0</sub>, when it started.<br> <br> With a snapshot running, WAFL takes a completely another approach than many of us are used to. If an existing "block" (that already contained data),<br> is going to be modified while the backup runs, WAFL just takes a new free block, and puts the modified block there.<br> The original block stays the same, and the inode (pointer) to that block is part of the Snapshot !<br> So, there is only one write (that to the new block). The inode (a pointer) of the original block is part of the Snapshot.<br> <br> It explains why snapshots are so incredably fast.<br> <br> <B>&#8658; "NetApp" Snapshot copy: the open file problem (2)</B><br> <br> From Ontap's perspective, there is no problem at all. However, many programs run on Hosts (Servers) and not on the Filer ofcourse.<br> So, applications like Oracle, SQL Server etc.. have a <B>completely different perspective</B>.<br> <br> The Snapshot copy might thus be inconsistent. This is not caused by Netapp. Netapp only produced a state image of pointers at t=t0.<br> And that is actually a good backup.<br> <br> The potential problem is this: NetApp created the snapshot at t<sub>0</sub>, during the t<sub>0</sub> to t=t<sub>1</sub> interval.<br> In that interval, a database file is fractioned, meaning that processes might have updated records in the databasefiles.<br> Typical of databases is, is that their <B> own checkpoint system process</B> flushes dirty blocks to disk, and update<br> fileheaders accordingly with a new "sequence number". If all files are in sync, the database engine considers the database<br> as "consistent". If that's not done, the database is "inconsistent" (so the database engine thinks).<br> <br> By the way, it's not databases alone that behave in that manner. Also all sorts of workflow, messaging, queuing programs etc..<br> show similar behaviour.<br> <br> Although the Snapshot copy is, from a filesystem view, perfectly consistent, Server programs might think differently.<br> That thus poses a problem.<br> <br> Netapp fixed that, by letting you install additional programs on any sort of Database Server.<br> These are "SnapDrive" and "SnapManager for xyz" (like SnapManager for SQL Server).<br> <br> In effect, just before the Snapshot starts, the SnapManager asks the Database to checkpoint and to "shut up" for a short while (freeze as it were).<br> SnapDrive will do the same for any other open filesystem processes.<br> The result is good consistent backups at all times.<br> <br> <br> <br> <br> <br> </body> </html>