Friday, 28 May 2010

11g: Data Recovery Advisor

The Data Recovery Advisor automatically diagnoses corruption or loss of persistent data on disk, determines the appropriate repair options, and executes repairs at the user's request. This reduces the complexity of recovery process, thereby reducing the Mean Time To Recover (MTTR).

Create a Failure:

Before we can start identifying and repairing failures, we need to create one. Probably the easiest way to do this is to corrupt a datafile using the "echo" command. The following script navigates to the directory holding the datafiles, checks the current size of the users01.dbf file, echos nothing to it, then checks the file size once more.

$ cd /u01/app/oracle/oradata/DB11G

$ ls -l users01.dbf
-rw-r----- 1 oracle oinstall 57745408 Jan 3 11:42 users01.dbf

$ echo > users01.dbf

$ ls -l users01.dbf
-rw-r----- 1 oracle oinstall 1 Jan 3 13:26 users01.dbf

You can see the file size has been reduced to a single byte.Next, we connect to the database and try to create a table in the USERS tablespace.

SQL> CREATE TABLE test_tab (id NUMBER) TABLESPACE USERS;
CREATE TABLE test_tab (id NUMBER) TABLESPACE USERS
*
ERROR at line 1:
ORA-01115: IO error reading block from file 4 (block # 3)
ORA-01110: data file 4: '/u01/app/oracle/oradata/DB11G/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 2

The resulting error triggers a data integrity check that searches the database for failures related to the error and records them in the Automatic Diagnostic Repository (ADR).

Relevant RMAN Functionality:

LIST FAILURE:

The LIST FAILURE command displays any failures with a status OPEN and a priority of CRITICAL or HIGH in order of importance. If no such failures exist it will list LOW priority failures.

RMAN> LIST FAILURE;

List of Database Failures
=========================

Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
202 HIGH OPEN 03-JAN-08

One or more non-system datafiles are corrupt

The LIST FAILURE command has a number of options that can be combined to alter the failures listed.

ADVISE FAILURE:

The ADVISE FAILURE command, as the name implies, provides repair advice for failures listed by the LIST FAILURE command, as well as closing all open failures that are already repaired.

RMAN> ADVISE FAILURE;

List of Database Failures
=========================

Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
202 HIGH OPEN 03-JAN-08 One or more non-system datafiles are corrupt


analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=124 device type=DISK
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
no manual actions available

Automated Repair Options
========================

Option Repair Description
------ ------------------
1 Restore and recover datafile 4
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/db11g/DB11G/hm/reco_3657335472.hm

If manual repair actions are produced you should attempt them first, as they are likely to be less disruptive. If manual repair actions aren't present, or they do not fix the problem, you can use the automated repair option.

REPAIR FAILURE:

The REPAIR FAILURE command applies the repair scripts produced by the ADVISE FAILURE command. Using the PREVIEW option lists the contents of the repair script without applying it.

RMAN> REPAIR FAILURE PREVIEW;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/db11g/DB11G/hm/reco_2408143298.hm
contents of repair script:
# restore and recover datafile
sql 'alter database datafile 4 offline';
restore datafile 4;
recover datafile 4;
sql 'alter database datafile 4 online';

By default, the REPAIR FAILURE command prompts the user to confirm the repair, but this can be prevented using the NOPROMPT keyword.

RMAN> REPAIR FAILURE NOPROMPT;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/db11g/DB11G/hm/reco_2408143298.hm
contents of repair script:
# restore and recover datafile
sql 'alter database datafile 4 offline';
restore datafile 4;
recover datafile 4;
sql 'alter database datafile 4 online';
executing repair script

sql statement: alter database datafile 4 offline

Starting restore at 03-JAN-08
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oracle/oradata/DB11G/users01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/flash_recovery_area/DB11G/backupset/
2008_01_03/o1_mf_nnndf_BACKUP_DB11G.WORLD_0_3qsl2hy4_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/flash_recovery_area/DB11G/backupset/2008_01_03/
o1_mf_nnndf_BACKUP_DB11G.WORLD_0_3qsl2hy4_.bkp tag=BACKUP_DB11G.WORLD_010308113407
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 03-JAN-08

Starting recover at 03-JAN-08
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 03-JAN-08

sql statement: alter database datafile 4 online
repair failure complete

CHANGE FAILURE:

The CHANGE FAILURE command allows you to change the priority of a failure or close an open failure. You may wish to change the priority of a failure if it does not represent a problem to you. For example, a failure associated with a tablespace you know longer use may be listed as a high priority, when in fact it has no effect on the normal running of your system.

RMAN> CHANGE FAILURE 202 PRIORITY LOW;

It is unlikely you will need to close an open failure, as even manually repaired failures are closed implicitly, but the option is there should you need it.

VALIDATE:

The VALIDATE command initiates data integrity checks, logging physical, and optionally logical, block corruptions of database files and backups in the V$DATABASE_BLOCK_CORRUPTION view and the Automatic Diagnostic Repository as one or more failures.

The following code shows some of the possible syntax variations.
# Check for physical corruption of all database files.
VALIDATE DATABASE;

# Check for physical and logical corruption of a tablespace.
VALIDATE CHECK LOGICAL TABLESPACE USERS;

# Check for physical and logical corruption of a datafile.
VALIDATE CHECK LOGICAL DATAFILE 4;

# Check for physical corruption of all archived redo logs files.
VALIDATE ARCHIVELOG ALL;

# Check for physical and logical corruption of the controlfile.
VALIDATE CHECK LOGICAL CURRENT CONTROLFILE;

# Check for physical and logical corruption of a specific backupset.
VALIDATE CHECK LOGICAL BACKUPSET 3;

The BACKUP VALIDATE and RESTORE VALIDATE commands perform the same checks as the VALIDATE command for the files targeted by the backup or restore command, but they don't actually perform the specified backup or restore operation. This allows you to check the integrity of a backup or restore operation before actually performing it.

The following code shows some of the possible syntax variations:

# Check for physical corruption of files to be backed up.
BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

# Check for physical and logical corruption of files to be backed up.
BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;

# Check for physical corruption of files to be restored.
RESTORE VALIDATE DATABASE;

# Check for physical and logical corruption of files to be restored.
RESTORE VALIDATE CHECK LOGICAL DATABASE;

No comments:

Post a Comment