Monday, December 2, 2013

How to Improve RMAN backup performance

Although backup and recovery tuning requires a good understanding of hardware and software used
like disk speed , IO , Buffering and/or MML used for net backup.

As many factors can affect backup performance. Often, finding the solution to a slow backup is a
process of trial and error. To get the best performance for a backup, follow the following
suggested steps:

Step 1: Remove RATE Parameters from Configured and Allocated Channels
=========================================================
The RATE parameter on a channel is intended to reduce, rather than increase, backup throughput, so
that more disk bandwidth is available for other database operations.

If your backup is not streaming to tape, then make sure that the RATE parameter is not set on the
ALLOCATE CHANNEL or CONFIGURE CHANNEL commands.


Step 2 : Consider Using I/O Slaves
========================

 -  If You Use Synchronous Disk I/O, Set DBWR_IO_SLAVES

If and only if your disk does not support asynchronous I/O, then try setting the DBWR_IO_SLAVES initialization parameter to a nonzero value. Any nonzero value for DBWR_IO_SLAVES causes a fixed number (four) of disk I/O slaves to be used for backup and restore, which simulates asynchronous I/O.
If I/O slaves are used, I/O buffers are obtained from the SGA. The large pool is used, if configured. Otherwise, the shared pool is used.

Note: By setting DBWR_IO_SLAVES, the database writer processes will use slaves as well.
You may need to increase the value of the PROCESSES initialization parameter.

-  Use Tape slaves To keep the tape streaming (continually moving) by simulating
asynchronous I/O

Set the "init.ora" parameter:
BACKUP_TAPE_IO_SLAVES = true

This causes one tape I/O slave to be assigned to each channel server process.

In 8i/9i/10g, if the DUPLEX option is specified, then tape I/O slaves must be enabled.
In this case, for DUPLCEX=<n>, there are <n> tape slaves per channel. These N slaves
all operate on the same four output buffers. Consequently, a buffer is not freed
up until all <n> slaves have finished writing to tape.


Step 3: If You Fail to Allocate Shared Memory, Set LARGE_POOL_SIZE
=========================================================
Set this initialization parameter if the database reports an error in the alert.log stating that it does not have enough memory and that it will not start I/O slaves.

The message should resemble the following:
ksfqxcre: failure to allocate shared memory means sync I/O will be used whenever async I/O to file not supported natively


When attempting to get shared buffers for I/O slaves, the database does the following:

* If LARGE_POOL_SIZE is set, then the database attempts to get memory from the large pool. If this value is not large enough, then an error is recorded in the alert log, the database does not try to get buffers from the shared pool, and asynchronous I/O is not used.
* If LARGE_POOL_SIZE is not set, then the database attempts to get memory from the shared pool.
* If the database cannot get enough memory, then it obtains I/O buffer memory from the PGA and writes a message to the alert.log file indicating that synchronous I/O is used for this backup.

The memory from the large pool is used for many features, including the shared server (formerly called multi-threaded server), parallel query, and RMAN I/O slave buffers. Configuring the large pool prevents RMAN from competing with other subsystems for the same memory.

Requests for contiguous memory allocations from the shared pool are usually small (under 5 KB) in size. However, it is possible that a request for a large contiguous memory allocation can either fail or require significant memory housekeeping to release the required amount of contiguous memory. Although the shared pool may be unable to satisfy this memory request, the large pool is able to do so. The large pool does not have a least recently used (LRU) list; the database does not attempt to age memory out of the large pool.

Use the LARGE_POOL_SIZE initialization parameter to configure the large pool. To see in which pool (shared pool or large pool) the memory for an object resides, query V$SGASTAT.POOL.

The formula for setting LARGE_POOL_SIZE is as follows:

LARGE_POOL_SIZE = number_of_allocated_channels *
(16 MB + ( 4 * size_of_tape_buffer ) )


Step 4: Tune RMAN Tape Streaming Performance Bottlenecks
================================================
There are several tasks you can perform to identify and remedy bottlenecks that affect RMAN's performance on tape backups:
Using BACKUP... VALIDATE To Distinguish Between Tape and Disk Bottlenecks

One reliable way to determine whether the tape streaming or disk I/O is the bottleneck in a given backup job is to compare the time required to run backup tasks with the time required to run BACKUP VALIDATE of the same tasks.
BACKUP VALIDATE of a backup to tape performs the same disk reads as a real backup but performs no tape I/O. If the time required for the BACKUP VALIDATE to tape is significantly less than the time required for a real backup to tape, then writing to tape is the likely bottleneck.

Using Multiplexing to Improve Tape Streaming with Disk Bottlenecks

In some situations when performing a backup to tape, RMAN may not be able to send data blocks to the tape drive fast enough to support streaming.

For example, during an incremental backup, RMAN only backs up blocks changed since a previous datafile backup as part of the same strategy. If you do not turn on change tracking, RMAN must scan entire datafiles for changed blocks, and fill output buffers as it finds such blocks. If there are not many changed blocks, RMAN may not fill output buffers fast enough to keep the tape drive streaming.

You can improve performance by increasing the degree of multiplexing used for backing up. This increases the rate at which RMAN fills tape buffers, which makes it more likely that buffers are sent to the media manager fast enough to maintain streaming.

Using Incremental Backups to Improve Backup Performance With Tape Bottlenecks

If writing to tape is the source of a bottleneck for your backups, consider using incremental backups as part of your backup strategy. Incremental level 1 backups write only the changed blocks from datafiles to tape, so that any bottleneck on writing to tape has less impact on your overall backup strategy. In particular, if tape drives are not locally attached to the node running the database being backed up, then incremental backups can be faster.

Step 5: Query V$ Views to Identify Bottlenecks
=====================================
If none of the previous steps improves backup performance, then try to determine the exact source of the bottleneck. Use the V$BACKUP_SYNC_IO and V$BACKUP_ASYNC_IO views to determine the source of backup or restore bottlenecks and to see detailed progress of backup jobs.

V$BACKUP_SYNC_IO contains rows when the I/O is synchronous to the process (or thread on some platforms) performing the backup.
V$BACKUP_ASYNC_IO contains rows when the I/O is asynchronous.
Asynchronous I/O is obtained either with I/O processes or because it is supported by the underlying operating system.


To determine whether your tape is streaming when the I/O is synchronous, query the EFFECTIVE_BYTES_PER_SECOND column in the V$BACKUP_SYNC_IO or V$BACKUP_ASYNC_IO view.
If EFFECTIVE_BYTES_PER_SECOND is less than the raw capacity of the hardware, then the tape is not streaming. If EFFECTIVE_BYTES_PER_SECOND is greater than the raw capacity of the hardware, the tape may or may not be streaming.

Compression may cause the EFFECTIVE_BYTES_PER_SECOND to be greater than the speed of real I/O.
Identifying Bottlenecks with Synchronous I/O

With synchronous I/O, it is difficult to identify specific bottlenecks because all synchronous I/O is a bottleneck to the process. The only way to tune synchronous I/O is to compare the rate (in bytes/second) with the device's maximum throughput rate. If the rate is lower than the rate that the device specifies, then consider tuning this aspect of the backup and restore process. The DISCRETE_BYTES_PER_SECOND column in the V$BACKUP_SYNC_IO view displays the I/O rate. If you see data in V$BACKUP_SYNC_IO, then the problem is that you have not enabled asynchronous I/O or you are not using disk I/O slaves.
Identifying Bottlenecks with Asynchronous I/O

Long waits are the number of times the backup or restore process told the operating system to wait until an I/O was complete. Short waits are the number of times the backup or restore process made an operating system call to poll for I/O completion in a nonblocking mode. Ready indicates the number of time when I/O was already ready for use and so there was no need to made an operating system call to poll for I/O completion.

The simplest way to identify the bottleneck is to query V$BACKUP_ASYNC_IO for the datafile that has the largest ratio for LONG_WAITS divided by IO_COUNT.

Note:
If you have synchronous I/O but you have set BACKUP_DISK_IO_SLAVES, then the I/O will be displayed in V$BACKUP_ASYNC_IO.


Also the following is a recommended for improving RMAN performance on AIX5L based system..
===================================================================
IBM suggestions the following AIX related advices:

1. set AIXTHREAD_SCOPE=S  in /etc/environment.

2. " ioo -o maxpgahead=256 " to set maxpgahead parameter
Initial settings were : Min/Maxpgahead 2 16

3. " vmo -o minfree=360 -o maxfree=1128 " to set minfree and maxfree...
Initial settings were : Min/Maxfree 240 256

Getting %15-20 performance improvements on RMAN backup performance on AIX 5L Based Systems.

Note: Document source ( oracle support)