/***************************************************************************/ /* Document : Quick Intro: Oracle 10g/11g RAC */ /* Doc. Versie : 3 */ /* File : ora10g11gRAC.txt */ /* Date : 11/10/2008 */ /* Content : Elementary Introduction on Real Application Clusters */ /* Compiled by : Albert van der Sel */ /***************************************************************************/ ----------------------------------------------------- REMARK: This is a very simple, and incomplete, document, providing a birds-eye view on Oracle RAC technology. ----------------------------------------------------- An "Oracle Real Application Cluster" (RAC), is about a clustered Oracle database. If a RAC is properly set up, all the nodes (Servers) are active at the same time, acting on the same one Database. This is very different from a failover Cluster. Let's first take on a discussion about Cluster systems in general. See Section 1. The example system used here is Linux, but the discussion about Cluster systems (Not targeted at RAC) here is "general" enough to be of use. As from section 2, we will discuss RAC. =============================================== 1. Discussion about Cluster systems in general: =============================================== 1.1 Cluster Overview (in general): ---------------------------------- To set up a cluster, an administrator must connect the cluster systems (often referred to as member systems) to the cluster hardware, and configure the systems into the cluster environment. The foundation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster maintains complete data integrity at all times by using the following methods of inter-node communication: • Quorum partitions on shared disk storage to hold system status • Ethernet (and optional serial or other type of connections) between the cluster systems for heartbeat channels To make an application and data highly available in a cluster, the administrator must configure a "cluster service" — a discrete group of service properties and resources, such as an application and shared disk storage. A service can be assigned an IP address to provide transparent client access to the service. For example, an administrator can set up a cluster service that provides clients with access to highly-available database application data. Both cluster systems can run any service and access the service data on shared disk storage. However, each service can run on only one cluster system at a time, in order to maintain data integrity. Administrators can set up - an "active-active" configuration in which both cluster systems run different services, or - an "active-passive" (hot-standby) configuration in which a primary cluster system runs all the services, and a backupcluster system takes over only if the primary system fails. NOTE: So this is actually a difference from Oracle 10g Real Application Cluster (RAC), where both instances, or multiple instances (from 2 - 100), accesses the single database on shared storage, at the same time ! Scetch of a 2-node Linux cluster ------------------------------------------ public network | | | | ------------ ------------- |cluster | |cluster | |system |Ethernet |system | | |--------------------| | | |heartbeat | | | | | | | |____________ | | |ServiceA | ----- -|--- | | |ServiceB |--|PWR| |PWR|----|ServiceC | | | ----- ----- | | | | |_______________| | | | | | ------------ ------------- | SCSI bus or Fible Channel | ------------------ -------------- Interconnect | | | | Fig 1.1 ----------- |Shared | - has Quorum partitions (or disks) |Disk | - has partitions (or disks) for ServiceA, B, C |Storage | ----------- Figure 1–1, shows an example of a cluster in an active-active configuration. If a hardware or software failure occurs, the cluster will automatically restart the failed system’s services on the functional cluster system. This service failover capability ensures that no data is lost, and there is little disruption to users. When the failed system recovers, the cluster can re-balance the services across the two systems. In addition, a cluster administrator can cleanly stop the services running on a cluster system and then restart them on the other system. This service relocation capability enables the administrator to maintain application and data availability when a cluster system requires maintenance. -- Service configuration framework: Clusters enable an administrator to easily configure individual services to make data and applications highly available. To create a service, an administrator specifies the resources used in the service and properties for the service, including the service name, application start and stop script, disk partitions, mount points, and the cluster system on which an administrator prefers to run the service. After the administrator adds a service, the cluster enters the information into the cluster database on shared storage, where it can be accessed by both cluster systems. The cluster provides an easy-to-use framework for database applications. For example, a database service serves highly-available data to a database application. The application running on a cluster system provides network access to database client systems, such as Web servers. If the service fails over to another cluster system, the application can still access the shared database data. A network-accessible database service is usually assigned an IP address, which is failed over along with the service to maintain transparent access for clients. The cluster service framework can be easily extended to other applications, as well. -- Multiple cluster communication methods: To monitor the health of the other cluster system, each cluster system monitors the health of the remote power switch, if any, and issues heartbeat pings over network and serial channels to monitor the health of the other cluster system. In addition, each cluster system periodically writes a timestamp and cluster state information to two (or more) quorum partitions located on shared disk storage. System state information includes whether the system is an active cluster member. Service state information includes whether the service is running and which cluster system is running the service. Each cluster system checks to ensure that the other system’s status is up to date. To ensure correct cluster operation, if a system is unable to write to both quorum partitions at startup time, it will not be allowed to join the cluster. In addition, if a cluster system is not updating its timestamp, and if heartbeats to the system fail, the cluster system will be removed from the cluster. If a hardware or software failure occurs, the cluster will take the appropriate action to maintain application availability and data integrity. For example, if a cluster system completely fails, the other cluster system will restart its services. Services already running on this system are not disrupted. When the failed system reboots and is able to write to the quorum partitions, it can rejoin the cluster and run services. Depending on how the services are configured, the cluster can re-balance the services across the two cluster systems. -- Manual service relocation capability: In addition to automatic service failover, a cluster enables administrators to cleanly stop services on one cluster system and restart them on the other system. This allows administrators to perform planned maintenance on a cluster system, while providing application and data availability. -- Event logging facility: To ensure that problems are detected and resolved before they affect service availability, the cluster daemons log messages by using the conventional Linux syslog subsystem. Administrators can customize the severity level of the logged messages. -- Application Monitoring: The cluster services infrastructure can optionally monitor the state and health of an application. In this manner, should an application-specific failure occur, the cluster will automatically restart the application. In response to the application failure, the application will attempt to be restarted on the member it was initially running on; failing that, it will restart on the other cluster member. -- Status Monitoring Agent: A cluster status monitoring agent is used to gather vital cluster and application state information. This information is then accessible both locally on the cluster member as well as remotely. A graphical user interface can then display status information from multiple clusters in a manner which does not degrade system performance. 1.2 Just an example of a more detailed view of an almost "No single point of failure" 2-Node Clustered System: -------------------------------------------------------------------------------------------------------------- ---------- |NETWORK | -------------------|SWITCH |----------------------- | ---------- | public network | | | --------------------- | --------------------- |network interface | ---------- |network interface | |-------------------- |terminal| |-------------------- |serial port |------|server |--------------|serial port | |-------------------- ---------- |-------------------- |CLUSTER | |CLUSTER | |SYSTEM | |SYSTEM | |-------------------- private network |-------------------- |network interface |------------------------------|network interface | |-------------------- |-------------------- |serial port |------------------------------|serial port | |-------------------- |-------------------- |serial port |-----------------\ | | |-------------------- ----- ----- |-------------------- |power plug |---|PWR| |PWR|----------|power plug | |-------------------- ----- ----- |-------------------- | | | |-------------------| | | -------------------------|serial port | |-------------------- |-------------------- |SCSI adapter (T) | |SCSI adapter (T) | --------------------- --------------------- | | | | ----------- ----------------------------------- | | | | (T) (T) ------------------------------------------------------- | Port A/in | Port B/in | | Port A/Out| Port B/Out | |------------------------------------------------------ | | | | | | | ------------------- -------------------- | | |controller 1 | |controller 2 | | | ------------------- -------------------- | | | | | | | | | RAID | ( ) ( ) | | | | | | ( ) ( ) | | | | mirrored shared disks | ------------------------------------------------------- ===================================================================================== 2. Overview of the architecture of a Single Oracle Instance compared to a RAC system: ===================================================================================== Let's first take a birds-eye overview of a single Instance architecture, compared to RAC architecture. 2.1 Single Instance: -------------------- If you look at a (traditional) Single Server where a single Oracle Instance (8i, 9i, 10g, 11g) is involved, you would see the following situation. >>>>> Files: You can find a number of database files, residing on a disksystem, amongst others are: . system.dbf: # this contains the dictionary (users, grants, table properties, packages etc..) . undo.dbf: # this contains "undo/rollback" information about all modifying SQL statements, and thus containing the "former situation" before transactions are committed to the DB. . redo logs: # in case of a crash, these write ahead logs can be used to redo committed transactions that which were not written to the datafiles yet, but were logged in the redo logs. . user defined # These are data files, organized in the logical concept of "tablespaces". data and index These tablespaces contain the tables (and indexes) tablespaces: Note: a tablespace consist of one or more files. To the Operating system, there are only files to be concerned of, but from the Database perspective, the DBA can create a logical entity called "tablespace", consisting of possibly multiple files, possibly distributed over multiple disks. Then, if the DBA then creates a table (or index), he or she should specify a tablespace, and thereby distributing the (future) tablecontent over multiple files, which might increase I/O performance. So, the DBA might create tablespaces with names like for example "DATA_BIG", "DATA_SMALL", "INDEX_BIG" etc.. >>>>> Memory structure and processes: The Instance gets created in memory when the DBA (or the system) "starts the database". Starting the database means that a number of processes gets active, and that a rather complex shared memory area gets created. This memory area is called the SGA (System Global Area) and contains some buffers and other pools, of which the following are most noticable: SGA contains: buffer cache : datablocks from disk, are cached in this buffer. Most of this cached data are blocks from tables and indexes. log buffer : small memory area which contains modified data which is about to be written to the redologs Shared pool : All used SQL queries and procedures are cached in this pool Library cache : The systems metadata is cached in this structure By the way, an Oracle Instance can be highly configured by a configuration file (traditionally that is the file "init.ora" which is an ascii file and can be edited to adjust values. The modern variant of "init.ora" is a binary file "spfile.ora".). Some of the parameters in that file, determine the sizes of the different caches and pools. For example, here is a section that determines the SGA of a small database: db_cache_size =268435456 java_pool_size = 67108864 shared_pool_size = 67108864 java_pool_size = 67108864 streams_pool_size = 67108864 So, "an instance" is ofcourse not synonym to the database files on disk, but is really the "stuff" that gets loaded or get created in memory. After an Oracle Database has started, a number of processes are running, among which the most notable are: pmon : process monitor smon : system monitor chkpt : checkpoint process dbwr : database writer process lgwr : the process that writes the redologs 2.2 Scetch of a 2-node RAC Architecture: ---------------------------------------- +: network (with example IP addresses in picture) user pc's/terminals -: Fiber, or other Storage connection [ ] [ ] [ ] + + + + + + (subnet 1: 192.168.1 ) Public Network + + + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + +192.168.1.1 (vip=192.168.1.100) 192.168.1.2 + (vip=192.168.1.101) + + Server Node: + + ================== Server Node: + |Server A | ================== |--> SGA, and |--------------- | |Server B | | processes like |Oracle | Fiber |--------------- | | pmon, smon etc.. |Instance A |----- --|Oracle |<-- |----------------| | | |Instance B | |Listener A | | | |----------------| |--------------- | | | |Listener B | If ASM is used as a shared |-local rdbms | | | |--------------- | storage for the database, | binaries etc.in| | | |-local rdbms | each node also has an | (ORACLE_HOME) | | (subnet 2: 192.168.2) | | binaries etc.in| ASM Instance. --|-CRS: Cluster | Private Network | (ORACLE_HOME) | | | ready services |++++++++++++++++++++++++++++++++++++++|-CRS: Cluster |--- | | (CRS_HOME) | 192.168.2.1 192.168.2.2 | ready services | | | ----------------- | (CRS_HOME) | | | |possibly vendor | | | |----------------- | | |Clusterware | | | |possibly vendor | | | ================== | Shared Disks: | |Clusterware | | | vendor | ---------------------- | ================== | | clusterware |--<>-- ( 1 Shared Database: )<>| | | is not needed | (system.dbf ) | | | but might be in | (temp.dbf ) | | | Extended | (users.dbf ) | | | (long distance) RAC | (data tablespaces ) | | | | (index tablespaces ) | | | | (etc.. ) | | | | (------------------- ) | | | |---<>--(private redolog(s) A) | | | |---<>--(private undo.dbf A) | | | (private redolog(s) B)<>| | | (private undo.dbf B)<>| | | (====================) | ----------------------<>--------( OCR:Oracle Cluster )--------<>-------------- ( Registry ) (- Voting Disk ) ---------------------- 2.3 Overview RAC: ----------------- RAC Architecture Overview Let's begin by reviewing the structure of a Real Applications Cluster. Physically, a RAC consists of several nodes (servers), connected to each other by a private interconnect, which most of the time will be a "private" Ethernet. The database files are kept on a shared storage subsystem, where they're accessible to all nodes. And each node has a public network connection. A cluster is a set of 2 or more machines (nodes) that share or coordinate resources to perform the same task. A RAC system is 2 or more instances running on a set of clustered nodes, with all instances accessing a shared set of database files (one Database). Depending on the O/S platform, a RAC database may be deployed on a cluster that uses vendor clusterware plus Oracle's own clusterware (Cluster Ready Services, CRS), or on a cluster that solely uses Oracle's own clusterware. Thus, every RAC sits on a cluster that is running Cluster Ready Services. srvctl is the primary tool DBAs use to configure CRS for their RAC database and processes. -- Cache Fushion: Each cluster database instance in an Oracle RAC cluster uses its own memory structures and background processes. Oracle RAC uses Cache Fusion to synchronize the data stored in the buffer cache of each cluster database instance. Cache Fusion moves current data blocks (which reside in memory) between database instances, rather than having one database instance write the data blocks to disk and requiring another database instance to reread the data blocks from disk. When a data block located in the buffer cache of one instance is required by another instance, Cache Fusion transfers the data block directly between the instances using the interconnect, enabling the Oracle RAC database to access and modify data as if the data resided in a single buffer cache. The CRS processes deal with cluster management, failover, services, OCR, Voting Disk etc.. But "the real cache fushion", that is Inter-instance block management, and lock management, that is dealt with by a number of specialized Oracle background processes. So, if an Instance reads a block from disk, then maybe a usersession will modify rows within that block. Another Instance may request the same block, because an other user session connected to this second Instance, wants to read (or modify) rows from that same block as well. Single Instance: > You probably know that a Single Instance Oracle database, consists of shared memory, where we can distinquish certain areas. This shared memory is called the SGA (System Global Area), which "contains" the Buffer cache (diskblocks cached from disk), the shared pool (parsed sql, plsql codes), and a number of other area's. like the large pool and the java pool. >Next to this shared memory, a number of background processes make up the Instance, most notably are pmon smon dbw0 lgwr ckpt and a number of other processes are running in a Single Instance. If you have installed a RAC database, you have two or more Instances running on two ore more Nodes. In this case, a number of additional structures and background processes deal with all the aspects of resource and lock management that can occur if two or more instances wants to access the same blocks. So, this in NOT the CRS processes who deals with that, but instead here we are talking of processes belonging to the Oracle kernel. In RAC, as more than one instance is accessing the resource, the instances require better coordination, at the resource level (database objects, blocks, locks). Sure, in RAC, the buffer cache of one Instance may contain data that is requested by an Instance at another node. In former versions of RAC, namely Oracle Parallel Server (e.g. OPS in Oracle 8i), this management framework was called DLM (Distributed Lock Management). In RAC 10g, 11g, this framework is made up by the Global Cache Services (GCS), the Global Enqueue Service (GES), and the Global Resource Directory (GRD). The datastructures in memory distributed among all nodes (GRD), along with the specialized RAC background processes, collaborate to enable "Cache Fushion". In RAC Instances, we can expect, among others, to see the following additional background processes: lms Global Cache Services process lmon Global Enqueue Services Monitor lmd Global Enqueue Services daemon lck0 Instance Enqueue process When datablocks are requested from users attached to different instances, a lot of "tracking" and "synchronization" needs to be done by, or with the aid of, the GCS, GES and GRD. GCS: Global Cache Service (GCS) is the main component of Oracle Cache Fusion technology. This is represented by background process LMSn. There can be max 10 LMS process for an instance. The main function of GCS is to track the status and location of data blocks. Status of data block means the mode and role of data block. GCS is the main mechanism by which cache coherency among “multiple cache” is maintained. GCS is also responsible for block transfer between the instances GES: Global Enqueue Service (GES) tracks the status of all Oracle enqueuing mechanism. This involves all non-cache fusion intra instance operations. GES performs concurrency control on dictionary cache locks, library cache locks and transactions. If performs this operation for resources that are accessed by more then once instance. Enqueue services are also present in single instance database. These are responsible for locking the rows on a table using different locking modes. GRD: GES and GCS together maintains Global Resource Directory (GRD). GRD is like a in-memory database which contains details about all the blocks that are present in cache. GRD know what is the location of latest version of block, what is the mode of block, what is the role of block etc. When ever a user ask for any data block GCS gets all the information from GRD. GRD is a distributed resource, meaning that each instance maintain some part of GRD. This distributed nature of GRD is a key to fault tolerance of RAC. GRD is stored in the SGA of each Instance.. -- Cluster Ready Services and the OCR Cluster Ready Services, or CRS, is a new feature for 10g RAC. Essentially, it is Oracle's own clusterware. On most platforms, Oracle supports vendor clusterware; in these cases, CRS interoperates with the vendor clusterware, providing high availability support and service and workload management. On Linux and Windows clusters, CRS serves as the sole clusterware. In all cases, CRS provides a standard cluster interface that is consistent across all platforms. CRS consists of four processes (crsd, occsd, evmd, and evmlogger) and two disks (partitions): the Oracle Cluster Registry (OCR), and the voting disk. The CRSD manages the HA functionality by starting, stopping, and failing over the application resources and maintaining the profiles and current states in the Oracle Cluster Registry (OCR) whereas the OCSSD manages the participating nodes in the cluster by using the voting disk. The OCSSD also protects against the data corruption potentially caused by "split brain" syndrome by forcing a machine to reboot. -- CRS Processes About those processes, we can show you how they run, and how they are started, on a unix system. CRS consists of four processes (on most platforms: oprocd, crsd, occsd, evmd) and two disks: the Oracle Cluster Registry (OCR), and the voting disk. On most platforms, you may see the following processes: oprocd the Process Monitor Daemon crsd Cluster Ready Services Daemon (CRSD) occsd Oracle Cluster Synchronization Service Daemon evmd Event Volume Manager Daemon Oracle CRS is Oracle's own clusterware tightly coupled with Oracle Real Application Clusters (RAC). CRS must be installed prior to the installation of Oracle RAC. It can also work over any third-party clustering software but there is no longer a requirement to buy and deploy such software. In short, Oracle CRS is primarily responsible for managing the high-availability (HA) architecture of Oracle RAC with the help of Cluster Ready Services Daemon (CRSD), Oracle Cluster Synchronization Server Daemon (OCSSD) and the Event Manager Daemon (EVMD). The CRSD manages the HA functionality by starting, stopping, and failing over the application resources and maintaining the profiles and current states in the Oracle Cluster Registry (OCR) whereas the OCSSD manages the participating nodes in the cluster by using the voting disk. The OCSSD also protects against the data corruption potentially caused by "split brain" syndrome by forcing a machine to reboot. Although Oracle CRS replaces the Oracle Cluster Manager (ORACM) in Oracle9i RAC, it does continue support for the Global Services Daemon (GSD), which in Oracle9i is responsible for communicating with the Oracle RAC database. In Oracle 10g, GSD's sole purpose is to serve Oracle9i clients (such as SRVCTL, Database Configuration Assistant, and Oracle Enterprise Manager). Financially, this is a very positive benefit since one is not bound to buy new client licenses and hardware to support an Oracle 10g database. To start and stop CRS when the machine starts or shutdown, on unix there are rc scripts in place. You can also, as root, manually start, stop, enable or disable the services with: /etc/init.d/init.crs start /etc/init.d/init.crs stop /etc/init.d/init.crs enable /etc/init.d/init.crs disable Or with # crsctl start crs # crsctl stop crs # crsctl enable crs # crsctl disable crs On a unix system, you may find the following in the /etc/inittab file. # cat /etc/inittab | grep crs h3:35:respawn:/etc/init.d/init.crsd run > /dev/null 2>&1 /dev/null 2>&1 /dev/null 2>&1 ls -al *init* init.crs init.crsd init.cssd init.evmd # cat /etc/inittab .. .. h1:35:respawn:/etc/init.d/init.evmd run > /dev/null 2>&1 /dev/null 2>&1 /dev/null 2>&1 >> The log files for the CRS daemon, crsd, can be found in the following directory: CRS_home/log/hostname/crsd/ >>> The log files for the CSS deamon, cssd, can be found in the following directory: CRS_home/log/hostname/cssd/ >>> The log files for the EVM deamon, evmd, can be found in the following directory: CRS_home/log/hostname/evmd/ >>> The log files for the Oracle Cluster Registry (OCR) can be found in the following directory: CRS_home/log/hostname/client/ >>> The log files for the Oracle RAC high availability component can be found in the following directories: CRS_home/log/hostname/racg/ $ORACLE_HOME/log/hostname/racg -- Enabling Debugging for an Oracle Clusterware Resource You can use crsctl commands to enable resource debugging using the following syntax, where resource_name is the name of an Oracle Clusterware resource, such as ora.docrac1.vip, and debugging_level is a number from 1 to 5: # crsctl debug log res resource_name:debugging_level -- Running the Oracle Clusterware Diagnostics Collection Script Run the diagcollection.pl script as the root user to collect diagnostic information from an Oracle Clusterware installation. The diagnostics provide additional information so that Oracle Support Services can resolve problems. Run this script from the operating system prompt as follows, where CRS_home is the home directory of your Oracle Clusterware installation: # CRS_home/bin/diagcollection.pl --collect This command displays the status of the Cluster Synchronization Services (CSS), Event Manager (EVM), and the Cluster Ready Services (CRS) daemons. -- CRS manages the following resources: . The ASM instances on each node (for an explanation of ASM, see section 4) . Databases . The instances on each node . Oracle Services on each node . The cluster nodes themselves, including the following processes, or "nodeapps": . VIP . GSD . The listener . The ONS daemon CRS stores information about these resources in the OCR. If the information in the OCR for one of these resources becomes damaged or inconsistent, then CRS is no longer able to manage that resource. Fortunately, the OCR automatically backs itself up regularly and frequently. 10g RAC (10.2) uses, or depends on,: - Oracle Clusterware (10.2), formerly referred to as CRS "Cluster Ready Services" (10.1). - Oracle's optional Cluster File System OCFS (This is optional), or use ASM and RAW. - Oracle Database extensions RAC is "scale out" technology: just add commodity nodes to the system. The key component is "cache fusion". Data are transferred from one node to another via very fast interconnects. Essential to 10g RAC is a "Shared Cache" technology. Automatic Workload Repository (AWR) plays a role also. The Fast Application Notification (FAN) mechanism that is part of RAC, publishes events that describe the current service level being provided by each instance, to AWR. The load balancing advisory information is then used to determine the best instance to serve the new request. . With RAC, ALL Instances of ALL nodes in a cluster, access a SINGLE database. . But every instance has it's own UNDO tablespace, and REDO logs. The Oracle Clusterware comprise several background processes that facilitate cluster operations. The Cluster Synchronization Service CSS, Event Management EVM, and Oracle Cluster components communicate with other cluster components layers in the other instances within the same cluster database environment. Questions per implementation arise in the following points: . Storage . Computer Systems/Storage-Interconnect . Datbase . Application Server . Public and Private networks . Application Control & Display On the Storage level, it can be said that 10g RAC supports - Automatic Storage Management (ASM) - Oracle Cluster File System (OCFS) - Network File System (NFS) - limited (only theoretical actually, except for 11g) - Disk raw partitions - Third party cluster file systems, like GPFS For application control and tools, it can be said that 10g RAC supports - OEM Grid Control http://hostname:5500/em OEM Database Control http://hostname:1158/em - "svrctl" is a command line interface to manage the cluster configuration, for example, starting and stopping all nodes in one command. - Cluster Verification Utility (cluvfy) can be used for an installation and sanity check. Failure in Client connections: Depending on the Net configuration, type of connection, type of transaction etc.., Oracle Net services provides a feature called "Transparant Application Failover" which can fail over a client session to another backup connection. About HA and DR: - RAC is HA , High Availability, that will keep things Up and Running in one site. - Data Guard is DR, Disaster Recovery, and is able to mirror one site to another remote site. 2.4 Storage with RAC: --------------------- We have the following Database storage options: Raw Raw devices, no filesystem present ASM Automatic Storage Management Third party CFS Vendor's Cluster File System OCFS Oracle Cluster File System LVM Logical Volume Manager NFS Network File System (must be on certified NAS device) Storage Oracle Clusterware OCR and Voting Disk Database Recovery area -------------- -------------------------------------- -------- ------------- Automatic Storage Management No Yes Yes Cluster file system (OCFS or Other) Yes Yes Yes Shared raw storage Yes Yes No Here is a description about file types. A regular single-instance database has three basic types of files: 1. database software and dump files (alertlog, trace files and that stuff); 2. datafiles, spfile, control files and log files, often referred to as "database files"; 3. and it may have recovery files, if using RMAN. and, in case of RAC: 4. A RAC database has an additional type of file referred to as "CRS files". These consist of the Oracle Cluster Registry (OCR) and the voting disk. Not all of these files have to be on the shared storage subsystem. The database files and CRS files must be accessible to all instances, so these *must be* on the shared storage subsystem. The database software can be on the shared subsystem and shared between nodes; or each node can have its own ORACLE_HOME. The flash recovery area must be shared by all instances, if used. Some storage options can't handle all of these file types. To take an obvious example, the database software and dump files can't be stored on raw devices. This isn't important for the dump files, but it does mean that choosing raw devices precludes having a shared ORACLE_HOME on the shared storage device. Remarks: 1. On a particular platform, there might exist a vendor specific solution for shared storage. For example, on AIX it is usually IBM GPFS that is used as a shared file system. But for this platform you might also use SFRAC of Veritas. VERITAS Storage Foundation for Oracle Real Application Clusters (SFRAC) provides an integrated solution stack for using clustered filesystems with Oracle RAC on AIX, as an alternative to using raw logical volumes, Automatic Storage Management (ASM) or the AIX General Parallel Filesystem (GPFS). If your OS is Linux Redhat, then investigate your options with the Redhat Global FileSystem GFS. 2. SAN solutions: And as far as SAN, there's no inherent SAN protocol that allows for block-level locking between hosts. Your clustered filesystem is responsible for providing that. ======================================================== 3. Oracle 10g RAC Installation example on Redhat Linux: ======================================================== This section shows how to install 10g RAC on a couple of Linux machines. But the method used, represents an installation on any platform. For the most part, on all platforms the installation is the same. 3.1 Prepare your nodes: ----------------------- 3.1.1 Scetch of a 2-node Linux cluster 192.168.2.0 ---------------------------------------------- public network | | Server A | Server B | ------------ ------------- |InstanceA |Private network |InstanceB | | |Ethernet (interconnect)| | | |-----------------------| | | |192.168.1.0 | | | | | | | |____________ | | | | ----- -|--- | | | |--|PWR| |PWR|-------| | | | ----- ----- | | | | |__________________| | | | | | ------------ ------------- | SCSI bus or Fible Channel | ------------------ --------------- | | | | ----------- |Shared | - has Single DB on ASM, or OCFS (or other Cluster FS), or RAW |Disk | - has OCR and Voting disk on OCFS (or other Cluster FS), or RAW |Storage | ----------- 3.1.2 Storage Options Storage Oracle Clusterware Database Recovery area -------------- ------------------ -------- ------------- Automatic Storage Management No Yes Yes Cluster file system (OCFS) Yes Yes Yes Shared raw storage Yes Yes No In the following, we will do an example installation on 3 nodes. 3.1.3 Install Redhat on all nodes with all options. 3.1.4 create oracle user and groups dba, oinstall on all nodes. Make sure they all have the same UID and GUI. 3.1.5 Make sure the user oracle has an appropriate .profile or .bash_profile 3.1.6 Every node needs a private network connection and a public network connection (at least two networkcards). 3.1.7 Linux kernel parameters: Most out of the box kernel parameters (of RHELS 3,4,5) are set correctly for Oracle except a few. You should have the following minimal configuration: net.ipv4.ip_local_port_range 1024 65000 kernel.sem 250 32000 100 128 kernel.shmmni 4096 kernel.shmall 2097152 kernel.shmmax 2147483648 fs.file-max 65536 You can check the most important parameters using the following command: # /sbin/sysctl -a | egrep 'sem|shm|file-max|ip_local' net.ipv4.ip_local_port_range = 1024 65000 kernel.sem = 250 32000 100 128 kernel.shmmni = 4096 kernel.shmall = 2097152 kernel.shmmax = 2147483648 fs.file-max = 65536 If some value should be changed, you can change the "/etc/sysctl.conf" file and run the "/sbin/sysctl -p" command to change the value immediately. Every time the system boots, the init program runs the /etc/rc.d/rc.sysinit script. This script contains a command to execute sysctl using /etc/sysctl.conf to dictate the values passed to the kernel. Any values added to /etc/sysctl.conf will take effect each time the system boots. 3.1.8 make sure ssh and scp are working on all nodes without asking for a password. Use shh-keygen to arrange that. 3.1.9 Example "/etc/host" on the nodes: Suppose you have the following 3 hosts, with their associated public and private names: public private oc1 poc1 oc2 poc2 oc3 poc3 Then this could be a valid "/etc/hosts" file on the nodes: 127.0.0.1 localhost.localdomain localhost 192.168.2.99 rhes30 192.168.2.166 oltp 192.168.2.167 mw 192.168.2.101 oc1 #public1 192.168.2.179 voc1 #virtual1 192.168.1.101 poc1 #private1 192.168.2.102 oc2 #public2 192.168.2.177 voc2 #virtual2 192.168.1.102 poc2 #private2 192.168.2.103 oc3 #public3 192.168.2.178 voc3 #virtual3 192.168.1.103 poc3 #private3 3.1.10 Example disk devices On all nodes, the shared disk devices should be accessible through the same devices names. Raw Device Name Physical Device Name Purpose /dev/raw/raw1 /dev/sda1 ASM Disk 1: +DATA1 /dev/raw/raw2 /dev/sdb1 ASM Disk 1: +DATA1 /dev/raw/raw3 /dev/sdc1 ASM Disk 2: +RECOV1 /dev/raw/raw4 /dev/sdd1 ASM Disk 2: +RECOV1 /dev/raw/raw5 /dev/sde1 OCR Disk (on RAW device) /dev/raw/raw6 /dev/sdf1 Voting Disk (on RAW device) So as you can see, we use a combination of ASM (for database files and recovery area), and RAW (for the OCR and Voting Disk). 3.2 CRS installation: --------------------- 3.2.1 First install CRS in its own home directory First install CRS in its own home directory, e.g. CRS10gHome, apart from the Oracle home dir. In fact, you NEED to install CRS first, before installing any Oacle RDBMS software. The special "thing" here, is that you perform the installation from one node, and that the setup program will in fact install CRS on all three nodes. Ofcourse, given that scp and shh works OK on all nodes, and that the accounts are all the same. But, you still need to run a few scripts on all individual nodes. As Oracle user: ./runInstaller --------------------------------------------------- | | Screen 1 |Specify File LOcations | | | |Source | |Path: /install/crs10g/Disk1/stage/products.xml | | | |Destination | |Name: CRS10gHome | |Path: /u01/app/oracle/product/10.1.0/CRS10gHome | | | --------------------------------------------------- --------------------------------------------------- | | Screen 2 |Cluster Configuration | | | |Cluster Name: lec1 | | | | Public Node Name Private Node Name | | --------------------------------------------- | | |oc1 | p0c1 | | | |-------------------------------------------- | | |oc2 | p0c2 | | | |-------------------------------------------- | | |oc3 | poc3 | | | |-------------------------------------------- | --------------------------------------------------- In the next screen, you specify which of your networks is to be used as the public interface (to connect to the public network) and which will be used for the private interconnect to support cache fushion and the cluster heartbeat. --------------------------------------------------- | | Screen 3 |Private Interconnect Enforcement | | | | | | | | Interface Name Subnet Interface type | | --------------------------------------------- | | |eth0 |192.168.2.0 |Public | | | |-------------------------------------------- | | |eth1 |192.168.1.0 |Private | | | |-------------------------------------------- | | | --------------------------------------------------- In the next screen, you specify /dev/raw/raw5 as the raw disk for the Oracle Cluster Registry. --------------------------------------------------- | | Screen 4 |Oracle Cluster Registry | | | |Specify OCR Location: /dev/raw/raw5 | (you are able to specify one extra mirror location of the OCR) | | --------------------------------------------------- In a similar fashion you specify the location of the Voting Disk. --------------------------------------------------- | | Screen 5 |Voting Disk | | | |Specify Voting Disk: /dev/raw/raw6 | (you are able to specify two extra locations | | for copies of the VD) --------------------------------------------------- You now have to execute the /u01/app/oracle/orainventory/orainstRoot.sh script on all Cluster Nodes as the root user. After this, you can continue with the other window, and see an "Install Summary" screen. No you click "Install" and the installation begins. Apart from the node you work on, the software will also be copied to the other nodes as well. After the installation is complete, you are once again prompted to run a script as root on each node of the Cluster. This is the script "/u01/app/oracle/product/10.1.0/CRS10gHome/root.sh". -- The olsnodes command. After finishing the CSR installation, you can verify that the installation completed successfully by running on any node the following command: # cd /u01/app/oracle/product/10.1.0/CRS10gHome/bin # olsnodes -n oc1 1 oc2 2 oc3 3 3.3 Database software installation: ----------------------------------- You can install the database software into the same directory in each node. With OCFS2, you might do one install in a common shared directory for all nodes. Because CSR is already running, the OUI detects that, and because its cluster aware, it provides you with the options to install a clustered implementation. You start the installation by running ./runInstaller as the oracle user on one node. For most part, it looks the same as a single-instance installation. After the file location screen, that is source and destination, you will see this screen: --------------------------------------------------- | | |Specify Hardware Cluster Installation Mode | | | | o Cluster installation mode | | | | Node name | | --------------------------------------------- | | | [] oc1 | | | | [] oc2 | | | | [] oc3 | | | --------------------------------------------- | | | | o Local installation (non cluster) | | | |-------------------------------------------------| Most of the time, you will do a "software only" installation, and create the database later with the DBCA. For the first node only, after some time, the Virtual IP Configuration Assistant, VIPCA, will start. Here you can configure the Virtual IP adresses you will use for application failover and the Enterprise Manager Agent. Here you will select the Virtual IP's for all nodes. VIPCA only needs to run once per Cluster. 3.4 Creating the RAC database with DBCA: ---------------------------------------- Launching the DBCA for installing a RAC database is much the same as launching DBCA for a single instance. If DBCA detects cluster software installed, it gives you the option to install a RAC database or a single instance. as oracle user: % dbca & --------------------------------------------------- | | |Welcome to the database configuration assistant | | | | | | | | o Oracle Real Application Cluster database | | | | o Oracle single instance database | | | |-------------------------------------------------| After selecting RAC, the next screen gives you the option to select nodes: --------------------------------------------------- | | |Select the nodes on which you want to create | |the cluster database. The local node oc1 will | |always be used whether or not it is selected. | | | | Node name | | --------------------------------------------- | | | [] oc1 | | | | [] oc2 | | | | [] oc3 | | | --------------------------------------------- | | | | | |-------------------------------------------------| In the next screens, you can choose the type of database (oltp, dw etc..), and all other items, just like a single instance install. At a cetain point, you can choose to use ASM diskgroups or RAW etc.., choose a flash-recovery area etc.. The way you install the database really resembles a normal single instance install, so we won't discuss that here. ============================== 5. ASM and RAC in Oracle 10g: ============================== A number of notes will explain ASM, and the integration of ASM into a RAC system. ======== Note 1: ======== Automatic Storage Management (ASM) in Oracle Database 10g With ASM, Automatic Storage Management, there is a separate lightweight 10g database involved. This ASM database (+ASM), contains all metadata about the ASM system. It also acts as the interface between the regular database and the filesystems. ASM will provide for presentation and implementation of a special filesystem, on which a number of redundancy/availability and performance features are implemented. In addition to the normal database background processes like CKPT, DBWR, LGWR, SMON, and PMON, an ASM instance uses at least two additional background processes to manage data storage operations. The Rebalancer process, RBAL, coordinates the rebalance activity for ASM disk groups, and the Actual ReBalance processes, ARBn, handle the actual rebalance of data extent movements. There are usually several ARB background processes (ARB0, ARB1, and so forth). Every database instance that uses ASM for file storage, will also need the two new processes. The Rebalancer background process (RBAL) handles global opens of all ASM disks in the ASM Disk Groups, while the ASM Bridge process (ASMB) connects as a foreground process into the ASM instance when the regular database instance starts. ASMB facilitates communication between the ASM instance and the regular database, including handling physical file changes like data file creation and deletion. ASMB exchanges messages between both servers for statistics update and instance health validation. These two processes are automatically started by the database instance when a new Oracle file type - for example, a tablespace's datafile -- is created on an ASM disk group. When an ASM instance mounts a disk group, it registers the disk group and connect string with Group Services. The database instance knows the name of the disk group, and can therefore use it to locate connect information for the correct ASM instance. ======== Note 2: ======== Some terminology in RAC: CRS cluster ready services - Clusterware: For Oracle10g on Linux and Windows-based platforms, CRS co-exists with, but does not inter-operate, with vendor clusterware. You may use vendor clusterware for all UNIX-based operating systems except for Linux. Even though, many of the Unix platforms have their own clusterware products, you *need to use* the CRS software to provide the RAC HA support services. CRS (cluster ready services) supports services and workload management and helps to maintain the continuous availability of the services. CRS also manages resources such as virtual IP (VIP) address for the node and the global services daemon. Note that the "Voting disks" and the "Oracle Cluster Registry", are regarded as part of the CRS. OCR: The Oracle Cluster Registry (OCR) contains cluster and database configuration information for Real Application Clusters Cluster Ready Services (CRS), including the list of nodes in the cluster database, the CRS application, resource profiles, and the authorizations for the Event Manager (EVM). The OCR can reside in a file on a cluster file system or on a shared raw device. When you install Real Application Clusters, you specify the location of the OCR. OCFS (not used often): OCFS is a shared disk cluster filesystem. Version 1 released for Linux is specifically designed to alleviate the need for manag-ing raw devices. It can contain all the oracle datafiles, archive log files and controlfiles. It is however not designed as a general purpose filesystem. OCFS2 is the next generation of the Oracle Cluster File System for Linux. It is an extent based, POSIX compliant file system. Unlike the previous release (OCFS), OCFS2 is a general-purpose file system that can be used for shared Oracle home installations making management of Oracle Real Application Cluster (RAC) installations even easier. Among the new features and benefits are: Node and architecture local files using Context Dependent Symbolic Links (CDSL) Network based pluggable DLM Improved journaling / node recovery using the Linux Kernel "JBD" subsystem Improved performance of meta-data operations (space allocation, locking, etc). Improved data caching / locking (for files such as oracle binaries, libraries, etc) - OCFS1 does NOT support a shared Oracle Home - OCFS2 does support a shared Oracle Home Though ASM appears to be the intended replacement for Oracle Cluster File System (OCFS) for the Real Applications Cluster (RAC). ASM supports Oracle Real Application Clusters (RAC), so there is no need for a separate Cluster LVM or a Cluster File System. So it boils down to: - You use or OCFS2, or RAW, or ASM (preferrably) for your database files. Storage Option Oracle Clusterware Database Recovery area -------------- ------------------ -------- ------------- Automatic Storage Management No Yes Yes Cluster file system (OCFS) Yes Yes Yes Shared raw storage Yes Yes No ======== Note 3: ======== Automatic Storage Management (ASM) simplifies database administration. It eliminates the need for you, as a DBA, to directly manage potentially thousands of Oracle database files. It does this by enabling you to create disk groups, which are comprised of disks and the files that reside on them. You only need to manage a small number of disk groups. In the SQL statements that you use for creating database structures such as tablespaces, redo log and archive log files, and control files, you specify file location in terms of disk groups. Automatic Storage Management then creates and manages the associated underlying files for you. Automatic Storage Management extends the power of Oracle-managed files. With Oracle-managed files, files are created and managed automatically for you, but with Automatic Storage Management you get the additional benefits of features such as mirroring and striping. The primary component of Automatic Storage Management is the disk group. You configure Automatic Storage Management by creating disk groups, which, in your database instance, can then be specified as the default location for files created in the database. Oracle provides SQL statements that create and manage disk groups, their contents, and their metadata. A disk group consists of a grouping of disks that are managed together as a unit. These disks are referred to as ASM disks. Files written on ASM disks are ASM files, whose names are automatically generated by Automatic Storage Management. You can specify user-friendly alias names for ASM files, but you must create a hierarchical directory structure for these alias names. You can affect how Automatic Storage Management places files on disks by specifying failure groups. Failure groups define disks that share components, such that if one fails then other disks sharing the component might also fail. An example of what you might define as a failure group would be a set of SCSI disks sharing the same SCSI controller. Failure groups are used to determine which ASM disks to use for storing redundant data. For example, if two-way mirroring is specified for a file, then redundant copies of file extents must be stored in separate failure groups. If you would take a look at the v$datafile, v$logfile, and v$controlfile of the regular Database, you would see information like in the following example: SQL> select file#, name from v$datafile; 1 +DATA1/rac0/datafile/system.256.1 2 +DATA1/rac0/datafile/undotbs.258.1 3 +DATA1/rac0/datafile/sysaux.257.1 4 +DATA1/rac0/datafile/users.259.1 5 +DATA1/rac0/datafile/example.269.1 SQL> select name from v$controlfile; +DATA1/rac0/controlfile/current.261.3 +DATA1/rac0/controlfile/current.260.3 -- Initialization Parameters (init.ora or SPFILE) for ASM Instances The following initialization parameters relate to an ASM instance. Parameters that start with ASM_ cannot be set in database instances. Name Description INSTANCE_TYPE Must be set to INSTANCE_TYPE = ASM. Note: This is the only required parameter. All other parameters take suitable defaults for most environments. DB_UNIQUE_NAME Unique name for this group of ASM instances within the cluster or on a node. Default: +ASM (Needs to be modified only if trying to run multiple ASM instances on the same node) ASM_POWER_LIMIT The maximum power on an ASM instance for disk rebalancing. Default: 1 Can range from 1 to 11. 1 is the lowest priority. See Also: "Tuning Rebalance Operations" ASM_DISKSTRING Limits the set of disks that Automatic Storage Management considers for discovery. Default: NULL (This default causes ASM to find all of the disks in a platform-specific location to which it has read/write access.). Example: /dev/raw/* ASM_DISKGROUPS Lists the names of disk groups to be mounted by an ASM instance at startup, or when the ALTER DISKGROUP ALL MOUNT statement is used. Default: NULL (If this parameter is not specified, then no disk groups are mounted.) Note: This parameter is dynamic and if you are using a server parameter file (SPFILE), then you should rarely need to manually alter this value. Automatic Storage Management automatically adds a disk group to this parameter when a disk group is successfully mounted, and automatically removes a disk group that is specifically dismounted. However, when using a traditional text initialization parameter file, remember to edit the initialization parameter file to add the name of any disk group that you want automatically mounted at instance startup, and remove the name of any disk group that you no longer want automatically mounted. -- ASM Views: The ASM configuration can be viewed using the V$ASM_% views, which often contain different information depending on whether they are queried from the ASM instance, or a dependant database instance. Viewing ASM Instance Information Via SQL Queries There are several dynamic and data dictionary views available to view an ASM configuration from within the ASM instance itself: ASM Dynamic Views: FROM ASM Instance Information View Name Description V$ASM_ALIAS Shows every alias for every disk group mounted by the ASM instance V$ASM_CLIENT Shows which database instance(s) are using any ASM disk groups that are being mounted by this ASM instance V$ASM_DISK Lists each disk discovered by the ASM instance, including disks that are not part of any ASM disk group V$ASM_DISKGROUP Describes information about ASM disk groups mounted by the ASM instance V$ASM_FILE Lists each ASM file in every ASM disk group mounted by the ASM instance V$ASM_OPERATION Like its counterpart, V$SESSION_LONGOPS, it shows each long-running ASM operation in the ASM instance V$ASM_TEMPLATE Lists each template present in every ASM disk group mounted by the ASM instance -- Managing disk groups The SQL statements introduced in this section are only available in an ASM instance. You must first start the ASM instance. Creating disk group examples: Example 1: ---------- Creating a Disk Group: Example The following examples assume that the ASM_DISKSTRING is set to '/devices/*'. Assume the following: ASM disk discovery identifies the following disks in directory /devices. /devices/diska1 /devices/diska2 /devices/diska3 /devices/diska4 /devices/diskb1 /devices/diskb2 /devices/diskb3 /devices/diskb4 The disks diska1 - diska4 are on a separate SCSI controller from disks diskb1 - diskb4. The following SQL*Plus session illustrates starting an ASM instance and creating a disk group named dgroup1. % SQLPLUS /NOLOG SQL> CONNECT / AS SYSDBA SQL> CREATE DISKGROUP dgroup1 NORMAL REDUNDANCY 2 FAILGROUP controller1 DISK 3 '/devices/diska1', 4 '/devices/diska2', 5 '/devices/diska3', 6 '/devices/diska4', 7 FAILGROUP controller2 DISK 8 '/devices/diskb1', 9 '/devices/diskb2', 10 '/devices/diskb3', 11 '/devices/diskb4'; In this example, dgroup1 is composed of eight disks that are defined as belonging to either failure group controller1 or controller2. Since NORMAL REDUNDANCY level is specified for the disk group, then Automatic Storage Management provides redundancy for all files created in dgroup1 according to the attributes specified in the disk group templates. For example, in the system default template shown in the table in "Managing Disk Group Templates", normal redundancy for the online redo log files (ONLINELOG template) is two-way mirroring. This means that when one copy of a redo log file extent is written to a disk in failure group controller1, a mirrored copy of the file extent is written to a disk in failure group controller2. You can see that to support normal redundancy level, at least two failure groups must be defined. Since no NAME clauses are provided for any of the disks being included in the disk group, the disks are assigned the names of dgroup1_0001, dgroup1_0002, ..., dgroup1_0008. Example 2: ---------- CREATE DISKGROUP disk_group_1 NORMAL REDUNDANCY FAILGROUP failure_group_1 DISK '/devices/diska1' NAME diska1, '/devices/diska2' NAME diska2, FAILGROUP failure_group_2 DISK '/devices/diskb1' NAME diskb1, '/devices/diskb2' NAME diskb2; Example 3: ---------- At some point in using OUI in installing the software, and creating a database, you will see the following screen: ---------------------------------------------------- |SPECIFY Database File Storage Option | | | | o File system | | Specify Database file location: ######### | | | | o Automatic Storage Management (ASM) | | | | o Raw Devices | | | | Specify Raw Devices mapping file: ########## | ---------------------------------------------------- Suppose that you have on a Linux machine the following raw disk devices: /dev/raw/raw1 8GB /dev/raw/raw2 8GB /dev/raw/raw3 6GB /dev/raw/raw4 6GB /dev/raw/raw5 6GB /dev/raw/raw6 6GB Then you can choose ASM in the upper screen, and see the following screen, where you can create the initial diskgroup and assign disks to it: ----------------------------------------------------- | Configure Automatic Storage Management | | | | Disk Group Name: data1 | | | | Redundancy | | o High o Normal o External | | | | Add member Disks | | |-------------------------------- | | | select Disk Path | | | |[#] /dev/raw/raw1 | | | |[#] /dev/raw/raw2 | | | |[ ] /dev/raw/raw3 | | | |[ ] /dev/raw/raw4 | | | -------------------------------- | | | ----------------------------------------------------- -- Mounting and Dismounting Disk Groups Disk groups that are specified in the ASM_DISKGROUPS initialization parameter are mounted automatically at ASM instance startup. This makes them available to all database instances running on the same node as Automatic Storage Management. The disk groups are dismounted at ASM instance shutdown. Automatic Storage Management also automatically mounts a disk group when you initially create it, and dismounts a disk group if you drop it. There may be times that you want to mount or dismount disk groups manually. For these actions use the ALTER DISKGROUP ... MOUNT or ALTER DISKGROUP ... DISMOUNT statement. You can mount or dismount disk groups by name, or specify ALL. If you try to dismount a disk group that contains open files, the statement will fail, unless you also specify the FORCE clause. Example The following statement dismounts all disk groups that are currently mounted to the ASM instance: ALTER DISKGROUP ALL DISMOUNT; The following statement mounts disk group dgroup1: ALTER DISKGROUP dgroup1 MOUNT; ======== Note 4: ======== -- Installing Oracle ASMLib for Linux: ASMLib is a support library for the Automatic Storage Management feature of Oracle Database 10g. This document is a set of tips for installing the Linux specific ASM library and its assocated driver. This library is provide to enable ASM I/O to Linux disks without the limitations of the standard Unix I/O API. The steps below are steps that the system administrator must follow. The ASMLib software is available from the Oracle Technology Network. Go to ASMLib download page and follow the link for your platform. You will see 4-6 packages for your Linux platform. -The oracleasmlib package provides the actual ASM library. -The oracleasm-support package provides the utilities used to get the ASM driver up and running. Both of these packages need to be installed. -The remaining packages provide the kernel driver for the ASM library. Each package provides the driver for a different kernel. You must install the appropriate package for the kernel you are running. Use the "uname -r command to determine the version of the kernel. The oracleasm kerel driver package will have that version string in its name. For example, if you were running Red Hat Enterprise Linux 4 AS, and the kernel you were using was the 2.6.9-5.0.5.ELsmp kernel, you would choose the oracleasm-2.6.9-5.0.5-ELsmp package. So, for example, to install these packages on RHEL4 on an Intel x86 machine, you might use the command: rpm -Uvh oracleasm-support-2.0.0-1.i386.rpm \ oracleasm-lib-2.0.0-1.i386.rpm \ oracleasm-2.6.9-5.0.5-ELsmp-2.0.0-1.i686.rpm Once the command completes, ASMLib is now installed on the system. -- Configuring ASMLib: Now that the ASMLib software is installed, a few steps have to be taken by the system administrator to make the ASM driver available. The ASM driver needs to be loaded, and the driver filesystem needs to be mounted. This is taken care of by the initialization script, "/etc/init.d/oracleasm". Run the "/etc/init.d/oracleasm" script with the "configure" option. It will ask for the user and group that default to owning the ASM driver access point. If the database was running as the 'oracle' user and the 'dba' group, the output would look like this: [root@ca-test1 /]# /etc/init.d/oracleasm configure Configuring the Oracle ASM library driver. This will configure the on-boot properties of the Oracle ASM library driver. The following questions will determine whether the driver is loaded on boot and what permissions it will have. The current values will be shown in brackets ('[]'). Hitting without typing an answer will keep that current value. Ctrl-C will abort. Default user to own the driver interface []: oracle Default group to own the driver interface []: dba Start Oracle ASM library driver on boot (y/n) [n]: y Fix permissions of Oracle ASM disks on boot (y/n) [y]: y Writing Oracle ASM library driver configuration [ OK ] Creating /dev/oracleasm mount point [ OK ] Loading module "oracleasm" [ OK ] Mounting ASMlib driver filesystem [ OK ] Scanning system for ASM disks [ OK ] This should load the oracleasm.o driver module and mount the ASM driver filesystem. By selecting enabled = 'y' during the configuration, the system will always load the module and mount the filesystem on boot. The automatic start can be enabled or disabled with the 'enable' and 'disable' options to /etc/init.d/oracleasm: [root@ca-test1 /]# /etc/init.d/oracleasm disable Writing Oracle ASM library driver configuration [ OK ] Unmounting ASMlib driver filesystem [ OK ] Unloading module "oracleasm" [ OK ] [root@ca-test1 /]# /etc/init.d/oracleasm enable Writing Oracle ASM library driver configuration [ OK ] Loading module "oracleasm" [ OK ] Mounting ASMlib driver filesystem [ OK ] Scanning system for ASM disks [ OK ] -- Making Disks Available to ASMLib: The system administrator has one last task. Every disk that ASMLib is going to be accessing needs to be made available. This is accomplished by creating an ASM disk. The /etc/init.d/oracleasm script is again used for this task: [root@ca-test1 /]# /etc/init.d/oracleasm createdisk VOL1 /dev/sdg1 Creating Oracle ASM disk "VOL1" [ OK ] Disk names are ASCII capital letters, numbers, and underscores. They must start with a letter. Disks that are no longer used by ASM can be unmarked as well: [root@ca-test1 /]# /etc/init.d/oracleasm deletedisk VOL1 Deleting Oracle ASM disk "VOL1" [ OK ] Any operating system disk can be queried to see if it is used by ASM: [root@ca-test1 /]# /etc/init.d/oracleasm querydisk /dev/sdg1 Checking if device "/dev/sdg1" is an Oracle ASM disk [ OK ] [root@ca-test1 /]# /etc/init.d/oracleasm querydisk /dev/sdh1 Checking if device "/dev/sdh1" is an Oracle ASM disk [FAILED] Existing disks can be listed and queried: [root@ca-test1 /]# /etc/init.d/oracleasm listdisks VOL1 VOL2 VOL3 [root@ca-test1 /]# /etc/init.d/oracleasm querydisk VOL1 Checking for ASM disk "VOL1" [ OK ] When a disk is added to a RAC setup, the other nodes need to be notified about it. Run the 'createdisk' command on one node, and then run 'scandisks' on every other node: [root@ca-test1 /]# /etc/init.d/oracleasm scandisks Scanning system for ASM disks [ OK ] -- Discovery Strings for Linux ASMLib: ASMLib uses discovery strings to determine what disks ASM is asking for. The generic Linux ASMLib uses glob strings. The string must be prefixed with "ORCL:". Disks are specified by name. A disk created with the name "VOL1" can be discovered in ASM via the discovery string "ORCL:VOL1". Similarly, all disks that start with the string "VOL" can be queried with the discovery string "ORCL:VOL*". Disks cannot be discovered with path names in the discovery string. If the prefix is missing, the generic Linux ASMLib will ignore the discovery string completely, expecting that it is intended for a different ASMLib. The only exception is the empty string (""), which is considered a full wildcard. This is precisely equivalent to the discovery string "ORCL:*". NOTE: Once you mark your disks with Linux ASMLib, Oracle Database 10g R1 (10.1) OUI will not be able to discover your disks. It is recommended that you complete a Software Only install and then use DBCA to create your database (or use the custom install). ======== Note 5: ======== Automatic Storage Management (ASM) is a new feature that has be introduced in Oracle 10g to simplify the storage of Oracle datafiles, controlfiles and logfiles. - Overview of Automatic Storage Management (ASM) - Initialization Parameters and ASM Instance Creation - Startup and Shutdown of ASM Instances - Administering ASM Disk Groups - Disks - Templates - Directories - Aliases - Files - Checking Metadata - ASM Filenames - ASM Views - SQL and ASM - Migrating to ASM Using RMAN Overview of Automatic Storage Management (ASM) Automatic Storage Management (ASM) simplifies administration of Oracle related files by allowing the administrator to reference disk groups rather than individual disks and files, which are managed by ASM. The ASM functionality is an extention of the Oracle Managed Files (OMF) functionality that also includes striping and mirroring to provide balanced and secure storage. The new ASM functionality can be used in combination with existing raw and cooked file systems, along with OMF and manually managed files. The ASM functionality is controlled by an ASM instance. This is not a full database instance, just the memory structures and as such is very small and lightweight. The main components of ASM are disk groups, each of which comprise of several physical disks that are controlled as a single unit. The physical disks are known as ASM disks, while the files that reside on the disks are know as ASM files. The locations and names for the files are controlled by ASM, but user-friendly aliases and directory structures can be defined for ease of reference. The level of redundancy and the granularity of the striping can be controlled using templates. Default templates are provided for each file type stored by ASM, but additional templates can be defined as needed. Failure groups are defined within a disk group to support the required level of redundancy. For two-way mirroring you would expect a disk group to contain two failure groups so individual files are written to two locations. In summary ASM provides the following functionality: Manages groups of disks, called disk groups. Manages disk redundancy within a disk group. Provides near-optimal I/O balancing without any manual tuning. Enables management of database objects without specifying mount points and filenames. Supports large files. Initialization Parameters and ASM Instance Creation The init.ora / spfile initialization parameters that are of specific interest for an ASM instance are: INSTANCE_TYPE - Set to ASM or RDBMS depending on the instance type. The default is RDBMS. DB_UNIQUE_NAME - Specifies a globally unique name for the database. This defaults to +ASM but must be altered if you intend to run multiple ASM instances. ASM_POWER_LIMIT - The maximum power for a rebalancing operation on an ASM instance. The valid values range from 1 to 11, with 1 being the default. The higher the limit the more resources are allocated resulting in faster rebalancing operations. This value is also used as the default when the POWER clause is omitted from a rebalance operation. ASM_DISKGROUPS - The list of disk groups that should be mounted by an ASM instance during instance startup, or by the ALTER DISKGROUP ALL MOUNT statement. ASM configuration changes are automatically reflected in this parameter. ASM_DISKSTRING - Specifies a value that can be used to limit the disks considered for discovery. Altering the default value may improve the speed of disk group mount time and the speed of adding a disk to a disk group. Changing the parameter to a value which prevents the discovery of already mounted disks results in an error. The default value is NULL allowing all suitable disks to be considered. Incorrect usage of parameters in ASM or RDBMS instances result in ORA-15021 errors. To create an ASM instance first create a file called init+ASM.ora in the /tmp directory containing the following information. INSTANCE_TYPE=ASM Next, using SQL*Plus connect to the ide instance. export ORACLE_SID=+ASM sqlplus / as sysdba Create an spfile using the contents of the init+ASM.ora file. SQL> CREATE SPFILE FROM PFILE='/tmp/init+ASM.ora'; File created. Finally, start the instance with the NOMOUNT option. SQL> startup nomount ASM instance started Total System Global Area 125829120 bytes Fixed Size 1301456 bytes Variable Size 124527664 bytes Database Buffers 0 bytes Redo Buffers 0 bytes SQL> The ASM instance is now ready to use for creating and mounting disk groups. To shutdown the ASM instance issue the following command. SQL> shutdown ASM instance shutdown SQL> Once an ASM instance is present disk groups can be used for the following parameters in database instances (INSTANCE_TYPE=RDBMS) to allow ASM file creation: DB_CREATE_FILE_DEST DB_CREATE_ONLINE_LOG_DEST_n DB_RECOVERY_FILE_DEST CONTROL_FILES LOG_ARCHIVE_DEST_n LOG_ARCHIVE_DEST STANDBY_ARCHIVE_DEST Here is an example of how to create a datafile using a default disk group specified by an initialization parameter setting. Suppose the Database initialization parameter file is set as follows: DB_CREATE_FILE_DEST = ‘+dskgrp01’ If you now create a tablespace SQL> CREATE TABLESPACE SALESDATA; it will be stored in +dskgrp01 Startup and Shutdown of ASM Instances ASM instance are started and stopped in a similar way to normal database instances. The options for the STARTUP command are: FORCE - Performs a SHUTDOWN ABORT before restarting the ASM instance. MOUNT - Starts the ASM instance and mounts the disk groups specified by the ASM_DISKGROUPS parameter. NOMOUNT - Starts the ASM instance without mounting any disk groups. OPEN - This is not a valid option for an ASM instance. The options for the SHUTDOWN command are: NORMAL - The ASM instance waits for all connected ASM instances and SQL sessions to exit then shuts down. IMMEDIATE - The ASM instance waits for any SQL transactions to complete then shuts down. It doesn't wait for sessions to exit. TRANSACTIONAL - Same as IMMEDIATE. ABORT - The ASM instance shuts down instantly. Aministering ASM Disk Groups Disk groups are created using the CREATE DISKGROUP statement. This statement allows you to specify the level of redundancy: NORMAL REDUNDANCY - Two-way mirroring, requiring two failure groups. HIGH REDUNDANCY - Three-way mirroring, requiring three failure groups. EXTERNAL REDUNDANCY - No mirroring for disks that are already protected using hardware mirroring or RAID. In addition failure groups and preferred names for disks can be defined. If the NAME clause is omitted the disks are given a system generated name like "disk_group_1_0001". The FORCE option can be used to move a disk from another disk group into this one. CREATE DISKGROUP disk_group_1 NORMAL REDUNDANCY FAILGROUP failure_group_1 DISK '/devices/diska1' NAME diska1, '/devices/diska2' NAME diska2, FAILGROUP failure_group_2 DISK '/devices/diskb1' NAME diskb1, '/devices/diskb2' NAME diskb2; Disk groups can be deleted using the DROP DISKGROUP statement. DROP DISKGROUP disk_group_1 INCLUDING CONTENTS; Disks can be added or removed from disk groups using the ALTER DISKGROUP statement. Remember that the wildcard "*" can be used to reference disks so long as the resulting string does not match a disk already used by an existing disk group. -- Add disks. ALTER DISKGROUP disk_group_1 ADD DISK '/devices/disk*3', '/devices/disk*4'; -- Drop a disk. ALTER DISKGROUP disk_group_1 DROP DISK diska2; Disks can be resized using the RESIZE clause of the ALTER DISKGROUP statement. The statement can be used to resize individual disks, all disks in a failure group or all disks in the disk group. If the SIZE clause is omitted the disks are resized to the size of the disk returned by the OS. -- Resize a specific disk. ALTER DISKGROUP disk_group_1 RESIZE DISK diska1 SIZE 100G; -- Resize all disks in a failure group. ALTER DISKGROUP disk_group_1 RESIZE DISKS IN FAILGROUP failure_group_1 SIZE 100G; -- Resize all disks in a disk group. ALTER DISKGROUP disk_group_1 RESIZE ALL SIZE 100G;The UNDROP DISKS clause of the ALTER DISKGROUP statement allows pending disk drops to be undone. It will not revert drops that have completed, or disk drops associated with the dropping of a disk group. ALTER DISKGROUP disk_group_1 UNDROP DISKS; Disk groups can be rebalanced manually using the REBALANCE clause of the ALTER DISKGROUP statement. If the POWER clause is omitted the ASM_POWER_LIMIT parameter value is used. Rebalancing is only needed when the speed of the automatic rebalancing is not appropriate. ALTER DISKGROUP disk_group_1 REBALANCE POWER 5; Disk groups are mounted at ASM instance startup and unmounted at ASM instance shutdown. Manual mounting and dismounting can be accomplished using the ALTER DISKGROUP statement as seen below. ALTER DISKGROUP ALL DISMOUNT; ALTER DISKGROUP ALL MOUNT; ALTER DISKGROUP disk_group_1 DISMOUNT; ALTER DISKGROUP disk_group_1 MOUNT; Templates Templates are named groups of attributes that can be applied to the files within a disk group. The following example show how templates can be created, altered and dropped. -- Create a new template. ALTER DISKGROUP disk_group_1 ADD TEMPLATE my_template ATTRIBUTES (MIRROR FINE); -- Modify template. ALTER DISKGROUP disk_group_1 ALTER TEMPLATE my_template ATTRIBUTES (COARSE); -- Drop template. ALTER DISKGROUP disk_group_1 DROP TEMPLATE my_template;Available attributes include: UNPROTECTED - No mirroring or striping regardless of the redundancy setting. MIRROR - Two-way mirroring for normal redundancy and three-way mirroring for high redundancy. This attribute cannot be set for external redundancy. COARSE - Specifies lower granuality for striping. This attribute cannot be set for external redundancy. FINE - Specifies higher granularity for striping. This attribute cannot be set for external redundancy. Directories A directory heirarchy can be defined using the ALTER DISKGROUP statement to support ASM file aliasing. The following examples show how ASM directories can be created, modified and deleted. -- Create a directory. ALTER DISKGROUP disk_group_1 ADD DIRECTORY '+disk_group_1/my_dir'; -- Rename a directory. ALTER DISKGROUP disk_group_1 RENAME DIRECTORY '+disk_group_1/my_dir' TO '+disk_group_1/my_dir_2'; -- Delete a directory and all its contents. ALTER DISKGROUP disk_group_1 DROP DIRECTORY '+disk_group_1/my_dir_2' FORCE;Aliases Aliases allow you to reference ASM files using user-friendly names, rather than the fully qualified ASM filenames. -- Create an alias using the fully qualified filename. ALTER DISKGROUP disk_group_1 ADD ALIAS '+disk_group_1/my_dir/my_file.dbf' FOR '+disk_group_1/mydb/datafile/my_ts.342.3'; -- Create an alias using the numeric form filename. ALTER DISKGROUP disk_group_1 ADD ALIAS '+disk_group_1/my_dir/my_file.dbf' FOR '+disk_group_1.342.3'; -- Rename an alias. ALTER DISKGROUP disk_group_1 RENAME ALIAS '+disk_group_1/my_dir/my_file.dbf' TO '+disk_group_1/my_dir/my_file2.dbf'; -- Delete an alias. ALTER DISKGROUP disk_group_1 DELETE ALIAS '+disk_group_1/my_dir/my_file.dbf'; Attempting to drop a system alias results in an error. Files Files are not deleted automatically if they are created using aliases, as they are not Oracle Managed Files (OMF), or if a recovery is done to a point-in-time before the file was created. For these circumstances it is necessary to manually delete the files, as shown below. -- Drop file using an alias. ALTER DISKGROUP disk_group_1 DROP FILE '+disk_group_1/my_dir/my_file.dbf'; -- Drop file using a numeric form filename. ALTER DISKGROUP disk_group_1 DROP FILE '+disk_group_1.342.3'; -- Drop file using a fully qualified filename. ALTER DISKGROUP disk_group_1 DROP FILE '+disk_group_1/mydb/datafile/my_ts.342.3'; Checking Metadata The internal consistency of disk group metadata can be checked in a number of ways using the CHECK clause of the ALTER DISKGROUP statement. -- Check metadata for a specific file. ALTER DISKGROUP disk_group_1 CHECK FILE '+disk_group_1/my_dir/my_file.dbf' -- Check metadata for a specific failure group in the disk group. ALTER DISKGROUP disk_group_1 CHECK FAILGROUP failure_group_1; -- Check metadata for a specific disk in the disk group. ALTER DISKGROUP disk_group_1 CHECK DISK diska1; -- Check metadata for all disks in the disk group. ALTER DISKGROUP disk_group_1 CHECK ALL; ASM Views The ASM configuration can be viewed using the V$ASM_% views, which often contain different information depending on whether they are queried from the ASM instance, or a dependant database instance. Viewing ASM Instance Information Via SQL Queries Finally, there are several dynamic and data dictionary views available to view an ASM configuration from within the ASM instance itself: -- ASM Dynamic Views: FROM ASM Instance Information View Name Description V$ASM_ALIAS Shows every alias for every disk group mounted by the ASM instance V$ASM_CLIENT Shows which database instance(s) are using any ASM disk groups that are being mounted by this ASM instance V$ASM_DISK Lists each disk discovered by the ASM instance, including disks that are not part of any ASM disk group V$ASM_DISKGROUP Describes information about ASM disk groups mounted by the ASM instance V$ASM_FILE Lists each ASM file in every ASM disk group mounted by the ASM instance V$ASM_OPERATION Like its counterpart, V$SESSION_LONGOPS, it shows each long-running ASM operation in the ASM instance V$ASM_TEMPLATE Lists each template present in every ASM disk group mounted by the ASM instance I was also able to query the following dynamic views against my database instance to view the related ASM storage components of that instance: -- ASM Dynamic Views: FROM Database Instance Information View Name Description V$ASM_DISKGROUP Shows one row per each ASM disk group that's mounted by the local ASM instance V$ASM_DISK Displays one row per each disk in each ASM disk group that are in use by the database instance V$ASM_CLIENT Lists one row per each ASM instance for which the database instance has any open ASM files ASM Filenames There are several ways to reference ASM file. Some forms are used during creation and some for referencing ASM files. The forms for file creation are incomplete, relying on ASM to create the fully qualified name, which can be retrieved from the supporting views. The forms of the ASM filenames are summarised below. Filename Type Format Fully Qualified ASM Filename +dgroup/dbname/file_type/file_type_tag.file.incarnation Numeric ASM Filename +dgroup.file.incarnation Alias ASM Filenames +dgroup/directory/filename Alias ASM Filename with Template +dgroup(template)/alias Incomplete ASM Filename +dgroup Incomplete ASM Filename with Template +dgroup(template) SQL and ASM ASM filenames can be used in place of conventional filenames for most Oracle file types, including controlfiles, datafiles, logfiles etc. For example, the following command creates a new tablespace with a datafile in the disk_group_1 disk group. CREATE TABLESPACE my_ts DATAFILE '+disk_group_1' SIZE 100M AUTOEXTEND ON;Migrating to ASM Using RMAN The following method shows how a primary database can be migrated to ASM from a disk based backup: Disable change tracking (only available in Enterprise Edition) if it is currently being used. SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;Shutdown the database. SQL> SHUTDOWN IMMEDIATEModify the parameter file of the target database as follows: Set the DB_CREATE_FILE_DEST and DB_CREATE_ONLINE_LOG_DEST_n parameters to the relevant ASM disk groups. Remove the CONTROL_FILES parameter from the spfile so the control files will be moved to the DB_CREATE_* destination and the spfile gets updated automatically. If you are using a pfile the CONTROL_FILES parameter must be set to the appropriate ASM files or aliases. Start the database in nomount mode. RMAN> STARTUP NOMOUNTRestore the controlfile into the new location from the old location. RMAN> RESTORE CONTROLFILE FROM 'old_control_file_name';Mount the database. RMAN> ALTER DATABASE MOUNT;Copy the database into the ASM disk group. RMAN> BACKUP AS COPY DATABASE FORMAT '+disk_group';Switch all datafile to the new ASM location. RMAN> SWITCH DATABASE TO COPY;Open the database. RMAN> ALTER DATABASE OPEN;Create new redo logs in ASM and delete the old ones. Enable change tracking if it was being used. SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING;Form more information see: Using Automatic Storage Management Migrating a Database into ASM Hope this helps. Regards Tim... Note 6: ======= How to Use Oracle10g release 2 ASM on Linux: [root@danaly etc]# fdisk /dev/cciss/c0d0 The number of cylinders for this disk is set to 8854. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): p Disk /dev/cciss/c0d0: 72.8 GB, 72833679360 bytes 255 heads, 63 sectors/track, 8854 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/cciss/c0d0p1 * 1 33 265041 83 Linux /dev/cciss/c0d0p2 34 555 4192965 82 Linux swap /dev/cciss/c0d0p3 556 686 1052257+ 83 Linux /dev/cciss/c0d0p4 687 8854 65609460 5 Extended /dev/cciss/c0d0p5 687 1730 8385898+ 83 Linux /dev/cciss/c0d0p6 1731 2774 8385898+ 83 Linux /dev/cciss/c0d0p7 2775 3818 8385898+ 83 Linux /dev/cciss/c0d0p8 3819 4601 6289416 83 Linux Command (m for help): n First cylinder (4602-8854, default 4602): Using default value 4602 Last cylinder or +size or +sizeM or +sizeK (4602-8854, default 8854): +20000M Command (m for help): n First cylinder (7035-8854, default 7035): Using default value 7035 Last cylinder or +size or +sizeM or +sizeK (7035-8854, default 8854): +3000M Command (m for help): n First cylinder (7401-8854, default 7401): Using default value 7401 Last cylinder or +size or +sizeM or +sizeK (7401-8854, default 8854): +3000M Command (m for help): p Disk /dev/cciss/c0d0: 72.8 GB, 72833679360 bytes 255 heads, 63 sectors/track, 8854 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/cciss/c0d0p1 * 1 33 265041 83 Linux /dev/cciss/c0d0p2 34 555 4192965 82 Linux swap /dev/cciss/c0d0p3 556 686 1052257+ 83 Linux /dev/cciss/c0d0p4 687 8854 65609460 5 Extended /dev/cciss/c0d0p5 687 1730 8385898+ 83 Linux /dev/cciss/c0d0p6 1731 2774 8385898+ 83 Linux /dev/cciss/c0d0p7 2775 3818 8385898+ 83 Linux /dev/cciss/c0d0p8 3819 4601 6289416 83 Linux /dev/cciss/c0d0p9 4602 7034 19543041 83 Linux /dev/cciss/c0d0p10 7035 7400 2939863+ 83 Linux /dev/cciss/c0d0p11 7401 7766 2939863+ 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks. [root@danaly data1]# /etc/init.d/oracleasm createdisk VOL5 /dev/cciss/c0d0p10 Marking disk "/dev/cciss/c0d0p10" as an ASM disk: [ OK ] [root@danaly data1]# /etc/init.d/oracleasm createdisk VOL6 /dev/cciss/c0d0p11 Marking disk "/dev/cciss/c0d0p11" as an ASM disk: [ OK ] [root@danaly data1]# /etc/init.d/oracleasm listdisks VOL1 VOL2 VOL3 VOL4 VOL5 VOL6 (THE FOLLOWING QUERIES ARE ISSUED FROM THE ASM INSTANCE.) [oracle@danaly ~]$ export ORACLE_SID=+ASM [oracle@danaly ~]$ sqlplus "/ as sysdba" SQL*Plus: Release 10.2.0.1.0 - Production on Sun Sep 3 00:28:09 2006 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ASM instance started Total System Global Area 83886080 bytes Fixed Size 1217836 bytes Variable Size 57502420 bytes ASM Cache 25165824 bytes ASM diskgroups mounted SQL> select group_number,disk_number,mode_status from v$asm_disk; GROUP_NUMBER DISK_NUMBER MODE_STATUS ------------ ----------- -------------- 0 4 ONLINE 0 5 ONLINE 1 0 ONLINE 1 1 ONLINE 1 2 ONLINE 1 3 ONLINE 6 rows selected. SQL> select group_number,disk_number,mode_status,name from v$asm_disk; GROUP_NUMBER DISK_NUMBER MODE_STATUS NAME ------------ ----------- -------------- --------------------------------- 0 4 ONLINE 0 5 ONLINE 1 0 ONLINE VOL1 1 1 ONLINE VOL2 1 2 ONLINE VOL3 1 3 ONLINE VOL4 6 rows selected. SQL> create diskgroup orag2 external redundancy disk 'ORCL:VOL5'; Diskgroup created. SQL> select group_number,disk_number,mode_status,name from v$asm_disk; GROUP_NUMBER DISK_NUMBER MODE_STATUS NAME ------------ ----------- -------------- ------------------------------------- 0 5 ONLINE 1 0 ONLINE VOL1 1 1 ONLINE VOL2 1 2 ONLINE VOL3 1 3 ONLINE VOL4 2 0 ONLINE VOL5 6 rows selected. (THE FOLLOWING QUERIES ARE ISSUED FROM THE DATABASE INSTANCE.) [oracle@danaly ~]$ export ORACLE_SID=danaly [oracle@danaly ~]$ sqlplus "/ as sysdba" SQL*Plus: Release 10.2.0.1.0 - Production on Sun Sep 3 00:47:04 2006 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ORACLE instance started. Total System Global Area 943718400 bytes Fixed Size 1222744 bytes Variable Size 281020328 bytes Database Buffers 654311424 bytes Redo Buffers 7163904 bytes Database mounted. Database opened. SQL> select name from v$datafile; NAME -------------------------------------------------------------------------------- +ORADG/danaly/datafile/system.264.600016955 +ORADG/danaly/datafile/undotbs1.265.600016969 +ORADG/danaly/datafile/sysaux.266.600016977 +ORADG/danaly/datafile/users.268.600016987 SQL> create tablespace eygle datafile '+ORAG2' ; Tablespace created. SQL> select name from v$datafile; NAME --------------------------------------------------------------------------------- +ORADG/danaly/datafile/system.264.600016955 +ORADG/danaly/datafile/undotbs1.265.600016969 +ORADG/danaly/datafile/sysaux.266.600016977 +ORADG/danaly/datafile/users.268.600016987 +ORAG2/danaly/datafile/eygle.256.600137647 oracle@danaly log]$ export ORACLE_SID=+ASM [oracle@danaly log]$ sqlplus "/ as sysdba" SQL*Plus: Release 10.2.0.1.0 - Production on Sun Sep 3 01:36:37 2006 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options SQL> alter diskgroup orag2 add disk 'ORCL:VOL6'; Diskgroup altered. ============ 7: OMF ============ In a way, OMF is partly a predeccessor for ASM, on Oracle 9i. The OMF feature only provides for easy management of tablespace files, like in create statements, where you do not have to "worry" anymore on filelocations. Using Oracle-managed files simplifies the administration of an Oracle database. Oracle-managed files eliminate the need for you, the DBA, to directly manage the operating system files comprising an Oracle database. You specify operations in terms of database objects rather than filenames. Oracle internally uses standard file system interfaces to create and delete files as needed for the following database structures: Tablespaces Online redo log files Control files The following initialization parameters init.ora/spfile.ora allow the database server to use the Oracle Managed Files feature: - DB_CREATE_FILE_DEST Defines the location of the default file system directory where Oracle creates datafiles or tempfiles when no file specification is given in the creation operation. Also used as the default file system directory for online redo log and control files if DB_CREATE_ONLINE_LOG_DEST_n is not specified. - DB_CREATE_ONLINE_LOG_DEST_n Defines the location of the default file system directory for online redo log files and control file creation when no file specification is given in the creation operation. You can use this initialization parameter multiple times, where n specifies a multiplexed copy of the online redo log or control file. You can specify up to five multiplexed copies Example: DB_CREATE_FILE_DEST = '/u01/oradata/payroll' DB_CREATE_ONLINE_LOG_DEST_1 = '/u02/oradata/payroll' DB_CREATE_ONLINE_LOG_DEST_2 = '/u03/oradata/payroll' ============================================= 8: Installation notes 10g RAC on Windows ============================================== 8.1 Before you install: ----------------------- Each node in a cluster requires the following: > One private internet protocol (IP) address for each node to serve as the private interconnect. The following must be true for each private IP address: -It must be separate from the public network -It must be accessible on the same network interface on each node -It must have a unique address on each node The private interconnect is used for inter-node communication by both Oracle Clusterware and RAC. If the private address is available from a network name server (DNS), then you can use that name. Otherwise, the private IP address must be available in each node's C:\WINNT\system32\drivers\etc\hosts file. > One public IP address for each node, to be used as the Virtual IP (VIP) address for client connections and for connection failover. The name associated with the VIP must be different from the default host name. This VIP must be associated with the same interface name on every node that is part of your cluster. In addition, the IP addresses that you use for all of the nodes that are part of a cluster must be from the same subnet. > One public fixed hostname address for each node, typically assigned by the system administrator during operating system installation. If you have a DNS, then register both the fixed IP and the VIP address with DNS. If you do not have DNS, then you must make sure that the public IP and VIP addresses for all nodes are in each node's host file. For example, with a two node cluster where each node has one public and one private interface, you might have the configuration shown in the following table for your network interfaces, where the hosts file is %SystemRoot%\system32\drivers\etc\hosts: Node Interface Name Type IP Address Registered In rac1 rac1 Public 143.46.43.100 DNS (if available, else the hosts file) rac1 rac1-vip Virtual 143.46.43.104 DNS (if available, else the hosts file) rac1 rac1-priv Private 10.0.0.1 Hosts file rac2 rac2 Public 143.46.43.101 DNS (if available, else the hosts file) rac2 rac2-vip Virtual 143.46.43.105 DNS (if available, else the hosts file) rac2 rac2-priv Private 10.0.0.2 Hosts file The virtual IP addresses are assigned to the listener process. To enable VIP failover, the configuration shown in the preceding table defines the public and VIP addresses of both nodes on the same subnet, 143.46.43. When a node or interconnect fails, then the associated VIP is relocated to the surviving instance, enabling fast notification of the failure to the clients connecting through that VIP. If the application and client are configured with transparent application failover options, then the client is reconnected to the surviving instance. To disable Windows Media Sensing for TCP/IP, you must set the value of the DisableDHCPMediaSense parameter to 1 on each node. Disable Media Sensing by completing the following steps on each node of your cluster: Use Registry Editor (Regedt32.exe) to view the following key in the registry: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters Add the following registry value: Value Name: DisableDHCPMediaSense Data Type: REG_DWORD -Boolean Value: 1 - External shared disks for storing Oracle Clusterware and database files. The disk configuration options available to you are described in Chapter 3, "Storage Pre-Installation Tasks". Review these options before you decide which storage option to use in your RAC environment. However, note that when Database Configuration Assistant (DBCA) configures automatic disk backup, it uses a database recovery area which must be shared. The database files and recovery files do not necessarily have to be located on the same type of storage. Determine the storage option for your system and configure the shared disk. Oracle recommends that you use Automatic Storage Management (ASM) and Oracle Managed Files (OMF), or a cluster file system. If you use ASM or a cluster file system, then you can also take advantage of OMF and other Oracle Database 10g storage features. If you use RAC on Oracle Database 10g Standard Edition, then you must use ASM. If you use ASM, then Oracle recommends that you install ASM in a separate home from the Oracle Clusterware home and the Oracle home. Oracle Database 10g Real Application Clusters installation is a two-phase installation. In phase one, use Oracle Universal Installer (OUI) to install Oracle Clusterware. In phase two, install the database software using OUI. When you install Oracle Clusterware or RAC, OUI copies the Oracle software onto the node from which you are running it. If your Oracle home is not on a cluster file system, then OUI propagates the software onto the other nodes that you have selected to be part of your OUI installation session. - Shared Storage for Database Recovery Area When you configure a database recovery area in a RAC environment, the database recovery area must be on shared storage. When Database Configuration Assistant (DBCA) configures automatic disk backup, it uses a database recovery area that must be shared. If the database files are stored on a cluster file system, then the recovery area can also be shared through the cluster file system. If the database files are stored on an Automatic Storage Management (ASM) disk group, then the recovery area can also be shared through ASM. If the database files are stored on raw devices, then you must use either a cluster file system or ASM for the recovery area. Note: ASM disk groups are always valid recovery areas, as are cluster file systems. Recovery area files do not have to be in the same location where datafiles are stored. For instance, you can store datafiles on raw devices, but use ASM for the recovery area. Data files are not placed on NTFS partitions, because they cannot be shared. Data files can be placed on Oracle Cluster File System (OCFS), on raw disks using ASM, or on raw disks. - Oracle Clusterware You must provide OUI with the names of the nodes on which you want to install Oracle Clusterware. The Oracle Clusterware home can be either shared by all nodes, or private to each node, depending on your responses when you run OUI. The home that you select for Oracle Clusterware must be different from the RAC-enabled Oracle home. Versions of cluster manager previous to Oracle Database 10g were sometimes referred to as "Cluster Manager". In Oracle Database 10g, this function is performed by a Oracle Clusterware component known as Cluster Synchronization Services (CSS). The OracleCSService, OracleCRService, and OracleEVMService replace the service known previous to Oracle Database 10g as OracleCMService9i. 8.2 cluvfy or runcluvfy.bat: ---------------------------- Once you have installed Oracle Clusterware, you can use CVU by entering cluvfy commands on the command line. To use CVU before you install Oracle Clusterware, you must run the commands using a command file available on the Oracle Clusterware installation media. Use the following syntax to run a CVU command run from the installation media, where media is the location of the Oracle Clusterware installation media and options is a list of one or more CVU command options: media\clusterware\cluvfy\runcluvfy.bat options You do not have to be the root user to use the CVU and the CVU assumes that the current user is the oracle user. The following code example is of a CVU help command, run from a staged copy of the Oracle Clusterware directory downloaded from OTN into a directory called stage on your C: drive: C:\stage\clusterware\cluvfy> runcluvfy.bat comp nodereach -n node1,node2 -verbose For a quick test, you can run the following CVU command that you would normally use after you have completed the basic hardware and software configuration: prompt> media\clusterware\cluvfy\runcluvfy.bat stage –post hwos –n node_list Use the location of your Oracle Clusterware installation media for the media value and a list of the nodes, separated by commas, in your cluster for node_list. Expect to see many errors if you run this command before you or your system administrator complete the cluster pre-installation steps. On Oracle Real Application Clusters systems, each member node of the cluster must have user equivalency for the Administrative privileges account that installs the database. This means that the administrative privileges user account and password must be the same on all nodes. - Checking the Hardware and Operating System Setup with CVU You can use two different CVU commands to check your hardware and operating system configuration. The first is a general check of the configuration, and the second specifically checks for the components required to install Oracle Clusterware. The syntax of the more general CVU command is: cluvfy stage –post hwos –n node_list [-verbose] where node_list is the names of the nodes in your cluster, separated by commas. However, because you have not yet installed Oracle Clusterware, you must execute the CVU command from the installation media using a command like the one following. In this example, the command checks the hardware and operating system of a two-node cluster with nodes named node1 and node2, using a staged copy of the installation media in a directory called stage on the C: drive: C:\stage\clusterware\cluvfy> runcluvfy.bat stage –post hwos –n node1,node2 -verbose You can omit the -verbose keyword if you do not wish to see detailed results listed as CVU performs each individual test. The following example is a command, without the -verbose keyword, to check for the readiness of the cluster for installing Oracle Clusterware: C:\stage\clusterware\cluvfy> runcluvfy.bat comp sys -n node1,node2 -p crs - Checking the Network Setup Enter a command using the following syntax to verify node connectivity between all of the nodes for which your cluster is configured: cluvfy comp nodecon -n node_list [-verbose] - Verifying Cluster Privileges Before running Oracle Universal Installer, from the node where you intend to run the Installer, verify that you have administrative privileges on the other nodes. To do this, enter the following command for each node that is a part of the cluster: net use \\node_name\C$ where node_name is the node name. If your installation will access drives in addition to the C: drive, repeat this command for every node in the cluster, substituting the drive letter for each drive you plan to use. For the installation to be successful, you must use the same user name and password on each node in a cluster or use a domain user name. If you use a domain user name, then log on under a domain with a user name and password to which you have explicitly granted local administrative privileges on all nodes. To verify your configuration BEFORE installing CRS: [oracle] $ cd /staging_area/clusterware/cluvfy [oracle] $ ./runcluvfy.sh stage -pre crsinst -n docrac1,docrac2 -verbose To verify your configuration AFTER installing CRS: $ ./runcluvfy.sh stage -post crsinst -n docrac1,docrac2 8.3 Shared disk considerations: ------------------------------- Preliminary Shared Disk Preparation Complete the following steps to prepare shared disks for storage: -- Disabling Write Caching You must disable write caching on all disks that will be used to share data between nodes in your cluster. To disable write caching, perform these steps: Click Start, then click Settings, then Control Panel, then Administrative Tools, then Computer Management, then Device Manager, and then Disk drives Expand the Disk drives and double-click the first drive listed Under the Disk Properties tab for the selected drive, uncheck the option that enables the write cache Double-click each of the other drives listed in the Disk drives hive and disable the write cache as described in the previous step Caution: Any disks that you use to store files, including database files, that will be shared between nodes, must have write caching disabled. -- Enabling Automounting for Windows 2003 If you are using Windows 2003, then you must enable disk automounting, depending on the Oracle products you are installing and on other conditions. You must enable automounting when using: Raw partitions for Oracle Real Application Clusters (RAC) Cluster file system for Oracle Real Application Clusters Oracle Clusterware Raw partitions for a single-node database installation Logical drives for Automatic Storage Management (ASM) To enable automounting: Enter the following commands at a command prompt: c:\> diskpart DISKPART> automount enable Automatic mounting of new volumes enabled. Type exit to end the diskpart session Repeat steps 1 and 2 for each node in the cluster. 8.4 Reviewing Storage Options for Oracle Clusterware, Database, and Recovery Files: ----------------------------------------------------------------------------------- This section describes supported options for storing Oracle Clusterware files, Oracle Database software, and database files. -- Overview of Oracle Clusterware Storage Options Note that Oracle Clusterware files include the Oracle Cluster Registry (OCR) and the Oracle Clusterware voting disk. There are two ways to store Oracle Clusterware files: 1. Oracle Cluster File System (OCFS): The cluster file system Oracle provides for the Windows and Linux communities. If you intend to store Oracle Clusterware files on OCFS, then you must ensure that OCFS volume sizes are at least 500 MB each. 2. Raw storage: Raw logical volumes or raw partitions are created and managed by Microsoft Windows disk management tools or by tools provided by third party vendors. Note that you must provide disk space for one mirrored Oracle Cluster Registry (OCR) file, and two mirrored voting disk files. -- Overview of Oracle Database and Recovery File Options There are three ways to store Oracle Database and recovery files on shared disks: 1. Automatic Storage Management (database files only): Automatic Storage Management (ASM) is an integrated, high-performance database file system and disk manager for Oracle files. Because ASM requires an Oracle Database instance, it cannot contain Oracle software, but you can use ASM to manage database and recovery files. 2. Oracle Cluster File System (OCFS): Note that if you intend to use OCFS for your database files, then you should create partitions large enough for the database files when you create partitions for Oracle Clusterware Note: If you want to have a shared Oracle home directory for all nodes, then you must use OCFS. 3. Raw storage: Note that you cannot use raw storage to store Oracle database recovery files. The storage option that you choose for recovery files can be the same as or different to the option you choose for the database files. Storage Option Oracle Clusterware Database Recovery area -------------- ------------------ -------- ------------- Automatic Storage Management No Yes Yes Cluster file system (OCFS) Yes Yes Yes Shared raw storage Yes Yes No -- Checking for Available Shared Storage with CVU To check for all shared file systems available across all nodes on the cluster, use the following CVU command: cluvfy comp ssa -n node_list Remember to use the full path name and the runcluvfy.bat command on the installation media and include the list of nodes in your cluster, separated by commas, for the node_list. The following example is for a system with two nodes, node1 and node2, and the installation media on drive F: F:\clusterware\cluvfy> runcluvfy.bat comp ssa -n node1,node2 If you want to check the shared accessibility of a specific shared storage type to specific nodes in your cluster, then use the following command syntax: cluvfy comp ssa -n node_list -s storageID_list In the preceding syntax, the variable node_list is the list of nodes you want to check, separated by commas, and the variable storageID_list is the list of storage device IDs for the storage devices managed by the file system type that you want to check. =========================================== 9. RAC tools an utilities. =========================================== 9.1: removing and adding a failed node: ======================================= Suppose, using above example, that instance rac3 on node oc3, fails. Suppose that you need to repair the node (e.g. harddisk crash). -- Remove the instance: % srvctl remove instance -d rac -i rac3 Remove instance rac3 for the database rac (y/n)? y or use DBCA to delete the rac3 instance -- Remove the node from the cluster: # cd /u01/app/oracle/product/10.1.0/CRS10gHome/bin # ./olsnode -n oc1 1 oc2 2 oc3 3 # cd ../install # ./rootdeletenode.sh oc3,3 # cd ../bin # ./olsnode -n oc1 1 oc2 2 # Suppose that you have repared host oc3. We now want to add it back into the cluster. Host oc3 has the OS newly installed, and its /etc/host file is just like it is on the other nodes. -- Add the node at the clusterware layer: From oc1 or oc2, go to the $CRS_Home/oui/bin directory, and run # ./addNode.sh A graphical screen pops up, and you are able to add oc3 to the cluster. Al CRS files are copied to the new node. To start the services on the new node, you are then prompted to run "rootaddnode.sh" on the active node and "root.sh" on the new node. # ./rootaddnode.sh # ssh oc3 # cd /u01/app/oracle product/10.1.0/CRS10gHome # ./root.sh 9.2 CRSCTL, OCRDUMP, OCRCONFIG, OCRCHECK: ========================================= These are the Primary tools to view status, manipulate OCR and Voting Disk: Example: Checking status Voting Disks: $ crsctl query css votedisk [root@node1-pub ~]# crsctl query css votedisk 0. 0 /u02/ocfs2/vote/VDFile_0 1. 0 /u02/ocfs2/vote/VDFile_1 2. 0 /u02/ocfs2/vote/VDFile_2 Located 3 voting disk(s). Example: You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path is the fully qualified path for the additional voting disk. Run the following command as the root user to add a voting disk: # crsctl add css votedisk /dev/raw/raw9 Example: Run the following command as the root user to remove a voting disk: # crsctl delete css votedisk path Example: Adding a extra OCR file: # ocrconfig -replace ocr /dev/raw/raw2 Example: show the automatic ocr backups: # ocrconfig -showbackup 9.3 Some other commandline tools: ================================= - olsnodes Example: If you want to list the nodes in the cluster running CRS $ olsnodes oranode1 oranode2 5. The "crs_" tools: There are a few other tools that provide information of your cluster. These tools are targeted at registered applications like the nodeapps or other apps. For example, to list the status of the apps in the cluster, use crs_stat: /home/oracle-->$CRS_HOME/bin/crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....SM1.asm application ONLINE ONLINE aix1 ora....x1.lsnr application ONLINE ONLINE aix1 ora....ix1.gsd application ONLINE ONLINE aix1 ora....ix1.ons application ONLINE ONLINE aix1 ora....ix1.vip application ONLINE ONLINE aix1 ora....SM2.asm application ONLINE ONLINE aix2 ora....x2.lsnr application ONLINE ONLINE aix2 ora....ix2.gsd application ONLINE ONLINE aix2 ora....ix2.ons application ONLINE ONLINE aix2 ora....ix2.vip application ONLINE ONLINE aix2 ora....test.db application ONLINE ONLINE aix1 ora....x1.inst application ONLINE ONLINE aix1 ora....x2.inst application ONLINE ONLINE aix2 /home/oracle--> Note 3: showing all nodes from a node: -------------------------------------- # lsnodes -v # cd /u01/app/oracle/product/10.1.0/CRS10gHome/bin # ./olsnodes -n oc1 1 oc2 2 oc3 3 9.4: using svrctl: ================== The Server Control SVRCTL utility is installed on each node by default. You can use SRVCTL to start and stop the database and instances, manage configuration information, and to move or remove instances and services. Some SVRCTL operations store configuration information in the OCR. SVRCTL performs other operations, such as starting and stopping instances, by sending request to the Oracle Clusterware process CSRD, which then starts or stops the Oracle Clusterware resources. srvctl must be run from the $ORACLE_HOME of the RAC you are administering. The basic format of a srvctl command is srvctl [options] where command is one of enable|disable|start|stop|relocate|status|add|remove|modify|getenv|setenv|unsetenv|config and the target, or object, can be a -database, -instance, -service, -ASM instance, or the -nodeapps. 1. SVRCTL: srvctl is the primary tool DBAs use to configure and manipulate Instances, RAC database and processes. Example 1. Bring up the MYSID1 instance of the MYSID database. $ srvctl start instance -d MYSID -i MYSID1 Example 2. Stop the MYSID database and all its instances, on all nodes: $ srvctl stop database -d MYSID $ srvctl stop database -d MYSID -o immediate Start the MYSID database and all instances: $ srvctl start database -d MYSID Example 3. Stop the orcl3 and orcl4 instances, associated to the orcl database, immediate. $ srvctl stop instance -d orcl -i "orcl3,orcl4" -o immediate -c "sysback/oracle as sysoper" Example 4. Stop the nodeapps (listener and other apps) on node1 $ srvctl stop nodeapps –n node1 Example 5. Check status RAC database $ svrctl status database -d orcl srvctl [options] command: enable|disable|start|stop|relocate|status|add|remove|modify|getenv|setenv|unsetenv|config target: database, instance, service, ASM instance, or the nodeapps srvctl commands, will affect the state information in the OCR -- Example 1: To view help: % svrctl -h % svrctl command -h -- Example 2: To see the SRVCTL version number, enter % svrctl -V -- Example 3. Bring up the MYSID1 instance of the MYSID database. % srvctl start instance -d MYSID -i MYSID1 -- Example 4. Stop the MYSID database: all its instances and all its services, on all nodes. % srvctl stop database -d MYSID The following command mounts all of the non-running instances, using the default connection information: % srvctl start database -d orcl -o mount -- Example 5. Stop the nodeapps on the myserver node. NB: Instances and services also stop. % srvctl stop nodeapps -n myserver -- Example 6. Add the MYSID3 instance, which runs on the myserver node, to the MYSID clustered database. % srvctl add instance -d MYSID -i MYSID3 -n myserver -- Example 7. Add a new node, the mynewserver node, to a cluster. % srvctl add nodeapps -n mynewserver -o $ORACLE_HOME -A 149.181.201.1/255.255.255.0/eth1 (The -A flag precedes an address specification.) -- Example 8. To change the VIP (virtual IP) on a RAC node, use the command % srvctl modify nodeapps -A new_address -- Example 9. Status of components . Find out whether the nodeapps on mynewserver are up. % srvctl status nodeapps -n mynewserver VIP is running on node: mynewserver GSD is running on node: mynewserver Listener is not running on node: mynewserver ONS daemon is running on node: mynewserver . Find out whether the ASM is running: % srvctl status asm -n docrac1 ASM instance +ASM1 is running on node docrac1. . Find status of cluster database % srvctl status database -d EOPP Instance EOPP1 is running on node dbq0201 Instance EOPP2 is running on node dbq0102 % srvctl config database -d EOPP dbq0201 EOPP1 /ora/product/10.2.0/db dbq0102 EOPP2 /ora/product/10.2.0/db % srvctl config service -d EOPP opp.et.supp PREF: EOPP1 AVAIL: EOPP2 opp.et.grid PREF: EOPP1 AVAIL: EOPP2 -- Example 10. The following command and output show the expected configuration for a three node database called ORCL. % srvctl config database -d ORCL server01 ORCL1 /u01/app/oracle/product/10.1.0/db_1 server02 ORCL2 /u01/app/oracle/product/10.1.0/db_1 server03 ORCL3 /u01/app/oracle/product/10.1.0/db_1 -- Example 11. Disable the ASM instance on myserver for maintenance. % srvctl disable asm -n myserver -- Example 12. Debugging srvctl Debugging srvctl in 10g couldn't be easier. Simply set the SRVM_TRACE environment variable. % export SRVM_TRACE=true -- Example 13. Question Version 10G RAC Q: how to add a listener to the nodeapps using the srvctl command ?? or even if it can be added using srvctl ?? A: just edit listener.ora on all concerned nodes and add entries ( the usual way). srvctl will automatically make use of it. For example % srvctl start database -d SAMPLE will start database SAMPLE and its associated listener LSNR_SAMPLE. -- Example 14. Adding services. % srvctl add database -d ORCL -o /u01/app/oracle/product/10.1.0/db_1 % srvctl add instance -d ORCL -i ORCL1 -n server01 % srvctl add instance -d ORCL -i ORCL2 -n server02 % srvctl add instance -d ORCL -i ORCL3 -n server03 % srvctl remove instance -d rac -i rac3 % srvctl disable instance -d orcl -i orcl2 % srvctl enable instance -d orcl -i orcl2 -- Example 15. Administering ASM Instances with SRVCTL in RAC You can use SRVCTL to add, remove, enable, and disable an ASM instance as described in the following procedure: Use the following to add configuration information about an existing ASM instance: % srvctl add asm -n node_name -i asm_instance_name -o oracle_home Use the following to remove an ASM instance: % srvctl remove asm -n node_name [-i asm_instance_name] -- Example 16. Stop multiple instances. The following command provides its own connection information to shut down the two instances orcl3 and orcl4 using the IMMEDIATE option: % srvctl stop instance -d orcl -i "orcl3,orcl4" -o immediate -c "sysback/oracle as sysoper" -- Example 17. Showing policies. Clusterware can automatically start your RAC database when the system restarts. You can use Automatic or Manual "policies", to control whether clusterware restarts RAC. To display the current policy: % srvctl config database -d database_name -a To change to another policy: % srvctl modify database -d database_name -y policy_name -- Example 18. % srvctl start service -d DITOB -- Example 19. Relocate a service from one instance to another % srvctl relocate service -d ORACLE -s CRM -i RAC04 -t RAC01 -- Example 20. Suppose you defined the HR service using the following command: % srvctl add service -d RACDB -s HR -r RAC02,RAC03 -a RAC01 After a while you realize that there is a workload peak on the HR service, and you decide to temporarily start HR on the RAC04 instance. % srvctl modify service -d RACDB -s HR -i RAC02,RAC03,RAC04 -a ROC01 % srvctl stop service -d RACDB -s HR % srvctl start service -d RACDB -s HR -- Example 21. How to disable oracle autostart function in RAC? You should be able to modify the service with: srvctl modify database -d -y MANUAL 9.6: crsctl: ============ crsctl is the primary tool to manipulate CRS. Example 1: Where is the Voting Disk (part of CRS) located? $ crsctl query css votedisk 0. 0 /dev/raw/raw2 Example 2: Do you want to check the health of the Clusterware? $ crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy Example 3: start or stop or enable or disable CRS $ crsctl start crs $ crsctl stop crs $ crsctl enable crs $ crsctl disable crs Example 4: Checking CRS Status on local node: [root@node1]# crsctl check crs Cluster Synchronization Services appears healthy Cluster Ready Services appears healthy Event Manager appears healthy Example 5: Checking CRS Status in cluster [root@node1]# crsctl check cluster node1-pub ONLINE node2-pub ONLINE Use CRSCTL to Control Your Clusterware Oracle Clusterware enables servers in an Oracle database Real Application Cluster to coordinate simultaneous workload on the same database files. The crsctl command provides administrators many useful capabilities. For example, with crsctl, you can check Clusterware health disable/enable Oracle Clusterware startup on boot, find information on the voting disk and check the Clusterware version, and more. >>> 1. Check the health of the Clusterware # crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy You can also check the status of an individual daemon using the following syntax, where daemon is one of crsd, cssd, or evmd: # crsctl check daemon >>> 2. Do you want to reboot a node for maintenance without Clusterware coming up on boot? ## Disable clusterware on machine2 bootup: # crsctl disable crs ## Stop the database then stop clusterware processes: # srvctl stop instance –d db –i db2 # crsctl stop crs # reboot ## Enable clusterware on machine bootup: # crsctl enable crs # crsctl start crs # srvctl start instance –d db –i db2 >>> 3. Check the location of the voting disk: # crsctl query css votedisk 0. 0 /dev/raw/raw2 >>> 4. Do you need to find out what clusterware version is running on a server? # crsctl query crs softwareversion CRS software version on node [db2] is [10.2.0.2.0] 9.7: Adding and Removing Voting Disks: ====================================== You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path is the fully qualified path for the additional voting disk. Run the following command as the root user to add a voting disk: # crsctl add css votedisk path Run the following command as the root user to remove a voting disk: # crsctl delete css votedisk path 9.9 CLUVFY: =========== The Cluster Verification Utility pre or post validates an Oracle Clusterware environment or configuration. We found the CVU utility to be very useful for checking a cluster server environment for RAC. The CVU can check shared storage, interconnects, server systems and user permissions. The Universal Installer runs the verification utility at the end of the cluster ware install. The utility can also be run from the command line with parameters and options to validate components. For example, a script that verifies a cluster using cluvfy is named runcluvfy.sh and is located on the /clusterware/cluvfy directory in the installation area. This script unpacks the utility, sets environment variables and executes the verification command. This command verifies that the hosts atlanta1, atlanta2 and atlanta3 are ready for a clustered database install of release 2. ./runcluvfy.sh stage -pre dbinst -n atlanta1,atlanta2,atlanta3 -r 10gR2 -osdba dba –verbose The results of the command above check user and group equivalence across machines, connectivity, interface settings, system requirements like memory, disk space and kernel settings and versions, required Linux package existence and so on. Any problems are reported as errors, all successful checks are marked as passed. Many other aspects of the cluster can be verified with this utility for Release 2 or Release 1. Some more examples: -- Checking for Available Shared Storage with CVU To check for all shared file systems available across all nodes on the cluster, use the following CVU command: % cluvfy comp ssa -n node_list Remember to use the full path name and the runcluvfy.bat command on the installation media and include the list of nodes in your cluster, separated by commas, for the node_list. The following example is for a system with two nodes, node1 and node2, and the installation media on drive F: % runcluvfy.bat comp ssa -n node1,node2 If you want to check the shared accessibility of a specific shared storage type to specific nodes in your cluster, then use the following command syntax: % cluvfy comp ssa -n node_list -s storageID_list In the preceding syntax, the variable node_list is the list of nodes you want to check, separated by commas, and the variable storageID_list is the list of storage device IDs for the storage devices managed by the file system type that you want to check. More on Using CLUVFY: --------------------- Using the Cluster Verification Utility to Diagnose Problems: The Cluster Verification Utility (CVU) can assist you in diagnosing a wide variety of configuration problems. This section contains the following topics: -- Enabling Tracing -- Checking the Settings for the Interconnect -- Troubleshooting a Node with Status of UNKNOWN -- Verifying the Existence of Node Applications -- Verifying the Integrity of Oracle Clusterware Components -- Verifying the Integrity of the Oracle Cluster Registry -- Verifying the Integrity of Your Entire Cluster Enabling Tracing: You can enable tracing by setting the environment variable SRVM_TRACE to true. After setting this variable to true, run the command that you want to trace. The CVU trace files are created in the CRS_HOME/cv/log directory. Oracle RAC automatically rotates the log files, and the most recently created log file has the name cvutrace.log.0. You should remove unwanted log files or archive them to reclaim disk space, if needed. The CVU does not generate trace files unless you enable tracing. Checking the Settings for the Interconnect: Cache Fusion enhances the performance of Oracle RAC by utilizing a high-speed interconnect to send data blocks to another instance's buffer cache. The high-speed interconnect should be a private network with the highest bandwidth to maximize performance. For network connectivity verification, the CVU discovers all the available network interfaces if you do not specify an interface on the CVU command line. To verify the accessibility of the cluster nodes from the local node or from any other cluster node, use the component verification command nodereach as follows: $ cluvfy comp nodereach -n node_list [ -srcnode node ] [-verbose] To verify that the other cluster nodes can be reached from the local node through all the available network interfaces or through specific network interfaces, use the component verification command nodecon as follows: $ cluvfy comp nodecon -n node_list [ -i interface_list ] [-verbose] You can also use the nodecon command without the -i option, as shown in the following example: $ cluvfy comp nodecon -n all [-verbose] When you issue the nodecom command as shown in the previous example, it instructs the CVU to perform the following tasks: . Discover all the network interfaces that are available on the cluster nodes. . Review the corresponding IP addresses and subnets for the interfaces. . Obtain the list of interfaces that are suitable for use as VIPs and the list of interfaces to private interconnects. . Verify the connectivity among all the nodes through those interfaces. You can run the nodecon command in verbose mode to identify the mappings between the interfaces, IP addresses, and subnets. To verify the connectivity among the nodes through specific network interfaces, use the comp nodecon command with the -i option. For example, you can verify the connectivity among the nodes docrac1, docrac2, and docrac3, through interface eth0 by running the following command: $ cluvfy comp nodecon -n docrac1, docrac2, docrac3 -i eth0 -verbose Troubleshooting a Node with Status of UNKNOWN: If you run the cluvfy command using the -verbose argument and the CVU responds with UNKNOWN for a particular node, then this is because the CVU cannot determine whether a check passed or failed. The cause of this could be because a node is not reachable, or as a result of any system problem that was occurring on that node at the time that the CVU was performing a check. The following is a list of possible causes for an UNKNOWN response: . The node is down. . Executable files that the CVU requires are missing in the CRS_home/bin directory or the $ORACLE_HOME/bin directory. . The user account that ran the CVU does not have privileges to run common operating system executable files on the node. . The node is missing an operating system patch or required package. . The kernel parameters on that node were not configured correctly and the CVU cannot obtain the operating system resources required to perform its checks. Verifying the Existence of Node Applications: To verify the existence of node applications, namely the virtual IP (VIP), Oracle Notification Services (ONS), and Global Service Daemon (GSD), on all the nodes, use the CVU comp nodeapp command, using the following syntax: $ cluvfy comp nodeapp [ -n node_list] [-verbose] Verifying the Integrity of Oracle Clusterware Components: To verify the existence of all the Oracle Clusterware components, use the component verification comp crs command, using the following syntax: $ cluvfy comp crs [ -n node_list] [-verbose] Verifying the Integrity of the Oracle Cluster Registry To verify the integrity of the Oracle Cluster Registry, use the component verification comp ocr command, using the following syntax: $ cluvfy comp ocr [ -n node_list] [-verbose] Verifying the Integrity of Your Entire Cluster To verify that all nodes in the cluster have the same view of the cluster configuration, use the component verification comp clu command, as follows: $ cluvfy comp clu - cluvfy The Cluster Verification Utility This tool has a very extended syntax and usage. You can use it to verify your systems before, during, and after installations as a sanity check.. Some examples that should give you an impression: $ ./runcluvfy.sh stage –post hwos –n node1,node2 -verbose $ ./runcluvfy.sh stage -pre crsinst -n docrac1,docrac2 -verbose You can easily make a whole day study, of all ins and outs of this tool. 9.9 Further checking: ===================== Use the following command to obtain component names, where module_name is crs, evm, css or the name of the module: # crsctl lsmodules module_name For example, viewing the components of the css module might return the following results: # crsctl lsmodules css The following are the CSS modules :: CSSD COMMCRS COMMNS >>> Enabling Debugging of Oracle Clusterware Components: You can enable debugging for the Oracle Cluster daemons, Event Manager (EVM), and their modules by running crsctl commands as follows, where component_name is the name of an Oracle Clusterware component for which you want to enable debugging, such as crs, evm, or css, module is the name of module as it appears in the output for the crcstl lsmodules command, and debugging_level is a number from 1 to 5: # crsctl debug log component module:debugging_level For example, to enable tracing for the CSSD module of the css component, you could use the following command: # crsctl debug log css CSSD:1 To enable extra debugging on the currently running CRS daemons as well as those that will run in the future: # crsctl debug log crs >>> Enabling Debugging for an Oracle Clusterware Resource: You can use crsctl commands to enable resource debugging using the following syntax, where resource_name is the name of an Oracle Clusterware resource, such as ora.docrac1.vip, and debugging_level is a number from 1 to 5: # crsctl debug log res resource_name:debugging_level To obtain a list of the resources available for debugging, use the following command: # crs_stat By default, Oracle enables traces for DBCA and the Database Upgrade Assistant (DBUA). For the CVU, GSDCTL, SRVCTL, and VIPCA, you can set the SRVM_TRACE environment variable to TRUE to make Oracle generate traces. Oracle writes traces to log files. For example, Oracle writes traces to log files in Oracle home/cfgtoollogs/dbca and Oracle home/cfgtoollogs/dbua for DBCA and the Database Upgrade Assistant (DBUA) respectively. ================================= 10: Example tnsnames.ora in RAC ================================= Example 1: ---------- Lets take an example of a RAC service called SVC with two instances SVC1 and SVC2 running on host1 and host2 (with virtual addresses host1_vip and host2_vip). The client tnsnames would look something like this: SVC =(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = host1_vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = host2_vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = SVC) (FAILOVER_MODE=(TYPE=select)(METHOD=basic)(RETRIES=10)(DELAY=1)))) Example 2: ---------- tnsnames.ora File RAC = (DESCRIPTION = (LOAD_BALANCE = ON) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = linux1_vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = linux2_vip)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = RAC))) RAC1 = (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = linux1_vip)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = TEST)(INSTANCE_NAME = RAC1))) RAC2 = (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = linux2_vip)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = TEST)(INSTANCE_NAME = RAC2))) The entries RAC1 and RAC2 are optional. Example 3: ---------- TEST = (DESCRIPTION = (LOAD_BALANCE = ON) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = testlinux1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = testlinux2)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = TEST)))) TEST1 = (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = testlinux1)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = TEST)(INSTANCE_NAME = TEST1))) TEST2 = (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = testlinux2)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = TEST)(INSTANCE_NAME = TEST2))) EXTPROC_CONNECTION_DATA = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))) (CONNECT_DATA = (SID=PLSExtProc)(PRESENTATION = RO))) LISTENERS_TEST = (ADDRESS = (PROTOCOL = TCP)(HOST = testlinux1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = testlinux2)(PORT = 1521)) Example 4: ---------- TAF (Transparent Application Failover) Transparent Application Failover actually refers to a failover that occurs when a node or instance is unavailable due to an outage or other reason that prohibits a connection to be established on that node. This can be set to on with the following parameter FAILOVER. Setting it to ON will activate the TAF. It is turned on by default unless you set it to OFF to disable it. Now, when you turn it on you have two types of connections available by the means of the FAILOVER_MODE parameter. The type can be session, which is default or select. When the type is SESSION, if the instance fails, then the user is automatically connected to the next available node without the user’s manual intervention. The SQL statements need to be carried out again on the next node. However, when you set the TYPE to SELECT, then if you are connected and are in the middle of your query, then your query will be restarted after you have been failed over to the next available node. Take this example of our tnsnames.ora file, (go to the section beginning with CONNECT_DATA): (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5) ) ) ============================================== 11: Notes about Backup and Restore of RAC ============================================== 11.1 Backing up Voting Disk: --------------------------- Run the following command to backup a voting disk. Perform this operation on every voting disk as needed where 'voting_disk_name' is the name of the active voting disk, and 'backup_file_name' is the name of the file to which you want to backup the voting disk contents: # dd if=voting_disk_name of=backup_file_name When you use the dd command for making backups of the voting disk, the backup can be performed while the Cluster Ready Services (CRS) process is active; you do not need to stop the crsd.bin process before taking a backup of the voting disk. -- Adding and Removing Voting Disks You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path is the fully qualified path for the additional voting disk. Run the following command as the root user to add a voting disk: # crsctl add css votedisk path Run the following command as the root user to remove a voting disk: # crsctl delete css votedisk path 11.2 Recovering Voting Disk: --------------------------- Run the following command to recover a voting disk where 'backup_file_name' is the name of the voting disk backupfile, and 'voting_disk_name' is the name of the active voting disk: # dd if=backup_file_name of=voting_disk_name 11.3 Backup and Recovery OCR: ---------------------------- Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership, and the OCR is a file that manages cluster and Oracle RAC database configuration information. Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old. This happes on one node, the second of the cluster install. The auto backup is stored in the CRS_HOME/cdata directory. You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides. The default location for generating backups on Red Hat Linux systems is "CRS_home/cdata/cluster_name" where cluster_name is the name of your cluster and CRS_home is the home directory of your Oracle Clusterware installation. -- Viewing Available OCR Backups To find the most recent backup of the OCR, on any node in the cluster, use the following command: # ocrconfig -showbackup -- Backing Up the OCR Because of the importance of OCR information, Oracle recommends that you use the ocrconfig tool to make copies of the automatically created backup files at least once a day. In addition to using the automatically created OCR backup files, you should also export the OCR contents to a file before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Exporting the OCR contents to a file lets you restore the OCR if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your cluster database after such changes, then you can restore your configuration by importing the saved OCR content from the valid configuration. >>> Export the contents of the OCR to a file: --------------------------------------------- To export the contents of the OCR to a file, use the following command, where backup_file_name is the name of the OCR backup file you want to create: # ocrconfig -export backup_file_name >>> Recovering the OCR: ----------------------- This section describes two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files. In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable. Run the following command to check the status of the OCR: # ocrcheck This command does an Integrety Check of the OCR. If this command does not display the message 'Device/File integrity check succeeded' for at least one copy of the OCR, then both the primary OCR and the OCR mirror have failed. You must restore the OCR from a backup. Example: CRS > ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 102508 Used space (kbytes) : 5360 Available space (kbytes) : 97148 ID : 1255672694 Device/File Name : /dev/dba_ocr_disk Device/File integrity check succeeded Device/File not configured Cluster registry integrity check succeeded >>> Restoring the Oracle Cluster Registry from Automatically Generated OCR Backups: ----------------------------------------------------------------------------------- When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery. To restore the OCR from an automatically generated backup on a Red Hat Linux system: Identify the available OCR backups using the ocrconfig command: # ocrconfig -showbackup Note: You must be logged in as the root user to run the ocrconfig command. Review the contents of the backup using the following ocrdump command, where file_name is the name of the OCR backup file: # ocrdump -backupfile file_name As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command: # crsctl stop crs Repeat this command on each node in your Oracle RAC cluster. As the root user, restore the OCR by applying an OCR backup file that you identified in step 1 using the following command, where file_name is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command. # ocrconfig -restore file_name As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command: # crsctl start crs Repeat this command on each node in your Oracle RAC cluster. Use the Cluster Verify Utility (CVU) to verify the OCR integrity. Run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster: $ cluvfy comp ocr -n all [-verbose] >>> Recovering the OCR from an OCR Export File: ----------------------------------------------- Using the ocrconfig -export command enables you to restore the OCR using the -import option if your configuration changes cause errors. To restore the previous configuration stored in the OCR from an OCR export file: Place the OCR export file that you created previously with the ocrconfig -export command in an accessible directory on disk. As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command: # crsctl stop crs Repeat this command on each node in your Oracle RAC cluster. As the root user, restore the OCR data by importing the contents of the OCR export file using the following command, where file_name is the name of the OCR export file: # ocrconfig -import file_name As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command: # crsctl start crs Repeat this command on each node in your Oracle RAC cluster. >> >Replacing an OCR: --------------------- If you need to change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, you can use the following procedure as long as one OCR file remains online. To change the location of an OCR: Use the OCRCHECK utility to verify that a copy of the OCR other than the one you are going to replace is online using the following command: $ ocrcheck Note: The OCR that you are replacing can be either online or offline. Verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation using the following command: $ crsctl check crs Run the following command to replace the OCR using either destination_file or disk to indicate the target OCR: # ocrconfig -replace ocr destination_file # ocrconfig -replace ocr disk Run the following command to replace an OCR mirror location using either destination_file or disk to indicate the target OCR: # ocrconfig -replace ocrmirror destination_file # ocrconfig -replace ocrmirror disk If any node that is part of your current Oracle RAC environment is shut down, then run the following command on the stopped node to let that node rejoin the cluster after the node is restarted: # ocrconfig -repair You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was not up while you were adding, replacing, or removing an OCR. To repair an OCR configuration, run the following command on the node on which you have stopped the Oracle Clusterware daemon: ocrconfig -repair ocrmirror device_name This operation only changes the OCR configuration on the node from which you run this command. For example, if the OCR mirror device name is /dev/raw1, then use the command syntax ocrconfig -repair ocrmirror /dev/raw1 on this node to repair its OCR configuration. >>> Add or remove Voting Disk: ------------------------------ You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path is the fully qualified path for the additional voting disk. Run the following command as the root user to add a voting disk: # crsctl add css votedisk path Run the following command as the root user to remove a voting disk: # crsctl delete css votedisk path Use the CVU to verify the OCR integrity. Run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster: $ cluvfy comp ocr -n all [-verbose] =================================== 12. Adding Nodes and Instances: =================================== 12.1 Adding a node: =================== This chapter describes how to add nodes and instances in Oracle Real Application Clusters (Oracle RAC) environments. You can use these methods when configuring a new Oracle RAC cluster, or when scaling up an existing Oracle RAC cluster. This chapter includes the following sections: A: Preparing Access to the New Node B: Extending the Oracle Clusterware Home Directory C: Extending the Oracle Automatic Storage Management Home Directory D: Extending the Oracle RAC Software Home Directory E: Creating a Listener on the New Node F: Adding a New Cluster Instance on the New Node For this chapter, it is very important that you perform each step in the order shown. Suppose we want to add the node docrac3 to the existing cluster of the systems docrac1 en docrac2. >>> A: Preparing Access to the New Node To prepare the new node prior to installing the Oracle software, refer to Chapter 2, "Preparing Your Cluster". It is critical that you follow the configuration steps for the following procedures to work. These steps include, but are not limited to the following: Adding the public and private node names for the new node to the /etc/hosts file on the existing nodes, docrac1 and docrac2 Verifying the new node can be accessed (using the ping command) from the existing nodes Running the following command on either docrac1 or docrac2 to verify the new node has been properly configured: # cluvfy stage -pre crsinst -n docrac3 >>> B: Extending the Oracle Clusterware Home Directory Now that the new node has been configured to support Oracle Clusterware, you use Oracle Universal Installer (OUI) to add an Oracle Clusterware home to the node being added to your Oracle RAC cluster. This chapter assumes that you are adding a node named docrac3 and that you have already successfully installed Oracle Clusterware on docrac1 in a nonshared home, where CRS_home represents the successfully installed Oracle Clusterware home. To extend the Oracle Clusterware installation to include the new node: 1. Verify the $ORACLE_HOME environment variable on docrac1 directs you to the successfully installed Oracle Clusterware home on that node. 2. Go to CRS_home/oui/bin and run the addNode.sh script. cd /opt/oracle/crs/oui/bin ./addNode.sh OUI starts and first displays the Welcome window. 3. Click Next. The Specify Cluster Nodes to Add to Installation window appears. 4. Select the node or nodes that you want to add. After selecting docrac3, click Next. 5. Verify the entries that OUI displays on the Summary Page and click Next. 6. Run the rootaddNode.sh script from the CRS_home/install/ directory on docrac1 when prompted to do so. Basically, this script adds the node applications of the new node to the OCR configuration. 7. Run the orainstRoot.sh script on the node docrac3 if OUI prompts you to do so. 8. Run the CRS_home/root.sh script on the node docrac3 to start Oracle Clusterware on the new node. 9. Add the new node's Oracle Notification Services (ONS) configuration information to the shared Oracle Cluster Registry (OCR). Obtain the ONS port identifier used by the new node, which you need to know for the next step, by running the following command from the CRS_home/opmn/conf directory on the docrac1 node: cat ons.config After you locate the ONS port number for the new node, you must make sure that the ONS on docrac1 can communicate with the ONS on the new node, docrac3. 10. From the CRS_home/bin directory on the node docrac1, run the Oracle Notification Services configuration utility as shown in the following example, where remote_port is the port number from step 9, and docrac3 is the name of the node that you are adding: ./racgons add_config docrac3:remote_port At the end of the cloning process, you should have Oracle Clusterware running on the new node. To verify the installation of Oracle Clusterware on the new node, you can run the following command as the root user on the newly configured node, docrac3: CRS_home/bin/cluvfy stage -post crsinst -n docrac3 -verbose >>> C: Extending the Oracle Automatic Storage Management Home Directory To extend an existing Oracle RAC database to a new node, you must configure the shared storage for the new database instances that will be created on new node. You must configure access to the same shared storage that is already used by the existing database instances in the cluster. For example, the sales cluster database in this guide uses Oracle Automatic Storage Management (ASM) for the database shared storage, so you must configure ASM on the node being added to the cluster. Because you installed ASM in its own home directory, you must configure an ASM home on the new node using OUI. The procedure for adding an ASM home to the new node is very similar to the procedure you just completed for extending Oracle Clusterware to the new node. To extend the ASM installation to include the new node: 1. Ensure that you have successfully installed the ASM software on at least one node in your cluster environment. To use these procedures as shown, your $ASM_HOME environment variable must identify your successfully installed ASM home directory. 2. Go to the $ASM_HOME/oui/bin directory on docrac1 and run the addNode.sh script. 3. When OUI displays the Node Selection window, select the node to be added (docrac3), then click Next. 4. Verify the entries that OUI displays on the Summary window, then click Next. 5. Run the root.sh script on the new node, docrac3, from the ASM home directory on that node when OUI prompts you to do so. You now have a copy of the ASM software on the new node. >>> D: Extending the Oracle RAC Software Home Directory Now that you have extended the Oracle Clusterware and ASM homes to the new node, you must extend the Oracle Database home on docrac1 to docrac3. The following steps assume that you have already completed the previous tasks described in this chapter, and that docrac3 is already a member node of the cluster to which docrac1 belongs. The procedure for adding an Oracle RAC home to the new node is very similar to the procedure you just completed for extending ASM to the new node. To extend the Oracle RAC installation to include the new node: 1. Ensure that you have successfully installed the Oracle RAC software on at least one node in your cluster environment. To use these procedures as shown, your $ORACLE_HOME environment variable must identify your successfully installed Oracle RAC home directory. 2. Go to the $ORACLE_HOME/oui/bin directory on docrac1 and run the addNode.sh script. 3. When OUI displays the Specify Cluster Nodes to Add to Installation window, select the node to be added (docrac3), then click Next. 4. Verify the entries that OUI displays in the Cluster Node Addition Summary window, then click Next. 5. Run the root.sh script on the new node, docrac3, from the $ORACLE_HOME directory on that node when OUI prompts you to do so. After completing these steps, you should have an installed Oracle RAC home on the new node. >>> E: Creating a Listener on the New Node To service database instance connection requests on the new node, you must create a Listener on that node. Use the Oracle Net Configuration Assistant (NETCA) to create a Listener on the new node. Before beginning this procedure, ensure that your existing nodes have the $ORACLE_HOME environment variable set correctly. To create a new Listener on the new node using Oracle Net Configuration Assistant: 1. Start the Oracle Net Configuration Assistant by entering netca at the system prompt from the $ORACLE_HOME/bin directory. NETCA displays the Welcome window. Click Help on any NETCA window for additional information. 2. Select Listener configuration, and click Next. NETCA displays the Listener Configuration, Listener window. 3. Select Add to create a new Listener, then click Next. NETCA displays the Listener Configuration, Listener Name window. 4. Accept the default value of LISTENER for the Listener name by clicking Next. NETCA displays the Listener Configuration, Select Protocols window. 5. Choose TCP and move it to the Selected Protocols area, then click Next. NETCA displays the Listener Configuration, TCP/IP Protocol window. 6. Choose Use the standard port number of 1521, then click Next. NETCA displays the Real Application Clusters window. 7. Select Cluster configuration for the type of configuration to perform, then click Next. NETCA displays the Real Application Clusters, Active Nodes window. 8. Select the name of the node you are adding, for example docrac3, then click Next. NETCA creates a Listener using the configuration information provided. You can now exit NETCA. You should now have a Listener named LISTENER running on the new node. >>> F: Adding a New Cluster Instance on the New Node You can use the Oracle Database Configuration Assistant (DBCA) to add database instances to new nodes. Before beginning this procedure, ensure that your existing nodes have the $ORACLE_HOME environment variable set correctly. To create a new cluster instance on the new node using DBCA: 1. Start DBCA by entering dbca at the system prompt from the $ORACLE_HOME/bin directory. DBCA displays the Welcome window for Oracle RAC. Click Help on any DBCA page for additional information. 2. Select Oracle Real Application Clusters database, and then click Next. DBCA displays the Operations window. 3. Select Instance Management, and then click Next. DBCA displays the Instance Management window. 4. Select Add an Instance, then click Next. DBCA displays the List of Cluster Databases window, which shows the databases and their current status, such as ACTIVE or INACTIVE. 5. In the List of Cluster Databases window, select the active Oracle RAC database to which you want to add an instance, for example sales. Enter the user name and password for the database user that has SYSDBA privileges. Click Next. DBCA will spend a few minutes performing tasks in the background, then it will display the Instance naming and node selection window. 6. In the Instance naming and node selection window, enter the instance name in the field at the top of this window if the default instance name provided by DBCA does not match your existing instance naming scheme. For example, instead of the sales3 instance, you might want to create the sales_03 instance. Click Next to accept the default instance name of sales3. DBCA displays the Instance Storage window. 7. In the Instance Storage window, you have the option of changing the default storage options and file locations for the new database instance. In this example, you accept all the default values and click Finish. DBCA displays the Summary window. 8. Review the information in the Summary window, then click OK to start the database instance addition operation. DBCA displays a progress dialog box showing DBCA performing the instance addition operation. 9. During the instance addition operation, if you are using ASM for your cluster database storage, DBCA detects the need for a new ASM instance on the new node. When DBCA displays a dialog box, asking if you want to ASM to be extended, click Yes. After DBCA extends ASM on the new node and completes the instance addition operation, DBCA displays a dialog box asking whether or not you want to perform another operation. Click No to exit DBCA. You should now have a new cluster database instance and ASM instance running on the new node. After you terminate your DBCA session, you should run the following command to verify the administrative privileges on the new node and obtain detailed information about these privileges: # CRS_home/bin/cluvfy comp admprv -o db_config -d oracle_home -n docrac3 -verbose ============================== 13: Starting and Stopping RAC: ============================== 13.1 Stopping the Cluster: ========================== Before you shut down any processes that are monitored by Enterprise Manager Grid Control, set a blackout in Grid Control for the processes that you intend to shut down. This is necessary so that the availability records for these processes indicate that the shutdown was planned downtime, rather than an unplanned system outage. Shut down all Oracle RAC instances on all nodes. To shut down all Oracle RAC instances for a database, enter the following command, where db_name is the name of the database: $ oracle_home/bin/srvctl stop database -d db_name Shut down all ASM instances on all nodes. To shut down an ASM instance, enter the following command, where node is the name of the node where the ASM instance is running: $ oracle_home/bin/srvctl stop asm -n node Stop all node applications on all nodes. To stop node applications running on a node, enter the following command, where node is the name of the node where the applications are running $ oracle_home/bin/srvctl stop nodeapps -n node Log in as the root user, and shut down the Oracle Clusterware or CRS process by entering the following command on all nodes: # CRS_home/bin/crsctl stop crs # as root 13.2 Starting the Cluster: ========================== # CRS_home/bin/crsctl start crs # as root $ oracle_home/bin/srvctl start nodeapps -n node $ oracle_home/bin/srvctl start asm -n node $ oracle_home/bin/srvctl start database -d db_name # will start all instances of the Database Example of stopping or starting Services on a RAC node: # shutdown services on a RAC node sudo -u oracle /opt/oracle/product/10.2.0/crs/bin/srvctl stop instance -d p01cfd -i pl01cfd2 -o immediate sudo -u oracle /opt/oracle/product/10.2.0/crs/bin/srvctl stop asm -n t-prod-oranode-11 sudo -u oracle /opt/oracle/product/10.2.0/crs/bin/srvctl stop nodeapps -n t-prod-oranode-11 # startup services on a RAC node sudo -u oracle /opt/oracle/product/10.2.0/crs/bin/srvctl start nodeapps -n t-prod-oranode-11 sudo -u oracle /opt/oracle/product/10.2.0/crs/bin/srvctl start asm -n t-prod-oranode-11 sudo -u oracle /opt/oracle/product/10.2.0/crs/bin/srvctl start instance -d p01cfd -i pl01cfd2 ==================================== 14: Other Noticable items in 10g RAC ==================================== 14.1 SPFILE: ============ If an initialization parameter applies to all instances, use *. notation, otherwise prefix the parameter with the name of the instance. For example: *.OPEN_CURSORS=500 prod1.OPEN_CURSORS=1000 Assume that you start an instance with an SPFILE containing the following entries: *.OPEN_CURSORS=500 prod1.OPEN_CURSORS=1000 For the instance with the Oracle system identifier (sid) prod1, the OPEN_CURSORS parameter remains set to 1000 even though it has a database-wide setting of 500. The instance-specific parameter setting in the parameter file for an instance prevents database-wide alterations of the setting. This gives you control over parameter settings for instance prod1. These two types of settings can appear in any order in the parameter file. If another DBA runs the following statement, then Oracle updates the setting on all instances except the instance with sid prod1: ALTER SYSTEM SET OPEN_CURSORS=1500 sid='*' SCOPE=MEMORY; In the example instance with sid prod1, the parameter begins accepting ALTER SYSTEM values set by other instances if you change the parameter setting by running the following statement: ALTER SYSTEM RESET OPEN_CURSORS SCOPE=MEMORY sid='prod1'; Then if you execute the following statement on another instance, the instance with sid prod1 also assumes the new setting of 2000: ALTER SYSTEM SET OPEN_CURSORS=2000 sid='*' SCOPE=MEMORY; 14.2 Rolling upgrades, opatch : =============================== >>> Patch: Oracle issues product fixes for its software called patches. When you apply the patch to your Oracle software installation, a small collection of files are replaced to fix certain bugs. OPatch is an Oracle supplied utility that facilitates Oracle software patching. The opatch binary file is located in the $ORACLE_HOME/OPatch directory. You can either specify this path when executing OPatch, or you can update the PATH environment variable to include the OPatch directory. For example, on RedHat Linux systems you would use a shell command similar to the following: $ export PATH=$PATH:/opt/oracle/10gR2/db_1/OPatch >>> Patchset: A group of patches form a patch set. When you apply a patch set, many different files and utilities are modified. This results in a version change for your Oracle software, for example, from Oracle Database 10.2.0.1.0 to Oracle Database 10.2.0.2.0. To apply a patch set you use the Oracle Universal Installer (OUI). >>> RAC Rolling Upgrade "One-off" patches or interim patches to database software are usually applied to implement known fixes for software problems an installation has encountered or to apply diagnostic patches to gather information regarding a problem. Such patch application is often carried out during a scheduled maintenance outage. Oracle now provides the capability to do rolling patch upgrades with Real Application Clusters with little or no database downtime. The tool used to achieve this is the "opatch" command-line utility. The advantage of a RAC rolling upgrade is that it enables at least some instances of the RAC installation to be available during the scheduled outage required for patch upgrades. Only the RAC instance that is currently being patched needs to be brought down. The other instances can continue to remain available. This means that the impact on the application downtime required for such scheduled outages is further minimized. Oracle's opatch utility enables the user to apply the patch successively to the different instances of the RAC installation. Rolling upgrade is available only for patches that have been certified by Oracle to be eligible for rolling upgrades. Typically, patches that can be installed in a rolling upgrade include: -Patches that do not affect the contents of the database such as the data dictionary -Patches not related to RAC internode communication -Patches related to client-side tools such as SQL*PLUS, Oracle utilities, development libraries, and Oracle Net -Patches that do not change shared database resources such as datafile headers, control files, and common header definitions of kernel modules Rolling upgrade of patches is currently available for one-off patches only. It is not available for patch sets. Rolling patch upgrades are not available for deployments where the Oracle Database software is shared across the different nodes. This is the case where the Oracle home is on Cluster File System (CFS) or on shared volumes provided by file servers or NFS-mounted drives. The feature is only available where each node has its own copy of the Oracle Database software. The opatch utility applies a patch successively to nodes of the RAC cluster. The nature of the patch enables a RAC installation to run in a mixed environment. Different instances of the database may be operating at the same time, and the patch may have been applied to some instances and not others. The opatch utility automatically detects the nodes of the cluster on which a specific RAC deployment has been implemented. The patch is applied to each node, one at a time. For each node, the DBA is prompted to shut down the instance. The patch is applied to the database software install on that node. After the current node has been patched, the instance can be restarted. After the patch is applied on the current node, the DBA is allowed to choose the next RAC node to apply the patch to. The cycle of instance shutdown, patch application, and instance startup is repeated. Thus, at any time during the patch application, only one node needs to be down. To check if a patch is a rolling patch, execute the following on UNIX platforms: $ opatch query -is_rolling You can check to see whether a patch has been marked rolling upgradeable by running "$ opatch query –is_rolling" or by checking the online_rac_installable flag for a value of TRUE within "etc/config/inventory" directory. Usually, the patch README file should detail whether the patch is rolling upgradeable, but when in doubt, you can use one of the aforementioned methods. Patches can be rolled back with the opatch utility. This enables the DBA to remove a troublesome patch or a patch that is no longer required. This can be done as a rolling procedure. To roll back a patch across all nodes of a RAC cluster, execute the following command: $ opatch rollback -id patch_id -ph patch_location To roll back a patch on the local node only, enter the following command: $ opatch rollback -local -id patch_id -ph patch_location To check the results of a patch rollback, check the logs in the following location: $ORACLE_HOME/.patch_storage/patch_id/patch_id_RollBack_timestamp.log 14.3 Redo Thread in RAC: ======================== Example: -------- Suppose you have one RAC database, and two instances: RAC01 and RAC02. Right now, you have 3 enabled threads of 2 group each: - thread 3 consisting of log groups 5 and 6 is currently being used by instance RAC01 - thread 2 consisting of log groups 3 and 4 is currently being used by instance RAC02 - thread 1 consisting of log groups 1 and 2 is not being used Now we want RAC01 to use thread 1 with log groups 1 and 2: SQL> alter system set thread=1 scope=spfile sid='RAC01'; ============================================ 15. TROUBLESHOOTING RAC ============================================ Note 1: VIP on wrong Interface: =============================== I was installing RAC, and during the clusterware install I picked up the wrong interfaces for public and private. I had 10.x.x.x for the public IP on eth0 and 192.x.x.x for the private IP (interconnect) on eth1. I also had 10.x.x.x for the VIP. During the install I choose eth1 to be the public interface. Right after the install I lost connection to the machine via the 10.x.x.x IP. What had happened was I had a 10.x.x.x IP on both eth0 and eth1, which was messing up the routing. The solution? Simply modify the VIP in the cluster configuration. There’s actually a metalink article about this. Here are the essential commands: srvctl stop nodeapps -n NODE1 srvctl stop nodeapps -n NODE2 srvctl modify nodeapps -n NODE1 -A 10.5.5.101/255.255.255.0/eth0 srvctl modify nodeapps -n NODE2 -A 10.5.5.102/255.255.255.0/eth0 srvctl start nodeapps -n NODE1 srvctl start nodeapps -n NODE2 They worked just fine. So if you ever mess up the interfaces, this is how you fix it.If you need to change the private interface, then you need to use oifcfg. To verify your current settings use: oifcfg getif eth0 10.5.5.0 global public eth1 192.168.0.0 global cluster_interconnect And use delif/addif to remove and re-create your private interface. Note 2: Troubleshooting and debuging: ===================================== You can enable debugging for the CRS, OCR, CSS, and EVM modules and their components by setting environment variables or by issuing crsctl debug commands using the following syntax: # crsctl debug log module_name component:debugging_level Run the following command to obtain component names where module_name is the name of the module, crs, evm, or css: # crsctl lsmodules module_name You must issue the crsctl debug command as the root user, and supply the following information: - module_name—The name of the module: CRS, EVM, or CSS. - component—The name of a component for the CRS, OCR, EVM, or CSS module. See Table F-1 for a list of all of the components. - debugging_level—A number from 1 to 5 to indicate the level of detail you want the debug command to return, where 1 is the least amount of debugging output and 5 provides the most detailed debugging output For example, to enable tracing for the CSSD module of the css component, you could use the following command: # crsctl debug log css CSSD:1 Other examples: # crsctl debug log crs "CRSRTI:1,CRSCOMM:2" # crsctl debug log evm "EVMCOMM:1" # crsctl debug log crs "CRSRTI:1,CRSCOMM:2,OCRSRV:4" You can use crsctl commands to enable resource debugging using the following syntax: # crsctl debug log res "ora.node1.vip:1" This has the effect of setting the environment variable USER_ORA_DEBUG, to 1, before running the start, stop, or check action scripts for the ora.node1.vip resource. # crsctl debug statedump evm - dumps state info for evm objects # crsctl debug statedump crs - dumps state info for crs objects # crsctl debug statedump css - dumps state info for css objects To enable extra debugging on the currently running CRS daemons as well as those that will run in the future, use the following command. Debugging information remains in the Oracle Cluster Registry (OCR) for use during the next startup. # crsctl debug log crs -- Running the Oracle Clusterware Diagnostics Collection Script Run the diagcollection.pl script as the root user to collect diagnostic information from an Oracle Clusterware installation. The diagnostics provide additional information so that Oracle Support Services can resolve problems. Run this script from the operating system prompt as follows, where CRS_home is the home directory of your Oracle Clusterware installation: # CRS_home/bin/diagcollection.pl --collect This command displays the status of the Cluster Synchronization Services (CSS), Event Manager (EVM), and the Cluster Ready Services (CRS) daemons. Debugging srvctl in 10g couldn't be easier. Simply set the SRVM_TRACE environment variable. % export SRVM_TRACE=true ============================================ 16. Most important wait events in 10gR2 RAC: ============================================