IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Recovering Oracle (and other files) via Veritas checkpoints
Here's a little report that I whipped up
that may interest some people here. I may
flesh it out and present it at Sage.
-----------------------------------------


Oracle restore via checkpoints

I attempted to do a single tablespace checkpoint rollback.
It was BAD. I ended up trashing my base.

The good: I got my 330G Oracle base back without resorting
to tape.

The bad: It took about 4 hours to copy the files around
when it should have taken 10 seconds of rolling back the
checkpoints.

The ugly: This notice is part of the 1st paragraph of the the
Veritas vxckptadm man page, which is needed to rollback from
a checkpoint.

DESCRIPTION
The vxckptadm utility is not an end-user supported utility
and should not be run by users. The VxDBA utility inter-
faces to this command, allowing management of Storage Check-
points.

If the utility WORKED, I wouldn't need to deal with this command.

There was a bug in the initial Veritas checkpoint save script that
put bad entries in the Veritas list of files to restore, which then
caused the restore step to abort.

The stupid: If I had spent a few more minutes rereading and
testing, I could have avoided the copy time. After finishing the
"copy restore", I reread the man page, checked the failure log,
and realised that the rollback needed file names, and would
not work on directories or file systems. So I then rolled it
all back via a large list of file names, and it worked fine.

On the other hand, this is why we test.

It seems that all the user level checkpoint tools deal with the
creation and mounting of checkpoint, not rolling back. fsckptadm
is the command for most of the work, but it has no way of rolling
back. The vxckptadm talks about rolling back Oracle instances
(which failed), and individual files, but NOT the entire file system,
which is what I wanted, but could not have.

Lessons learned:

Never trust Veritas stuff without testing. We always knew that,
but it is worth repeating. In this case they use 'sqlplus' for some
system setup, but will get confused with the results if things such
as "set timing on" are in the glogin.sql file.

Keep a percentage of all file systems empty to be used for checkpoint
data. They work great, but you need to know what you are doing. The
amount of free space required is based on amount of data being changed.
Checkpoints survive system reboot, so you can keep a LONG history
around if you want to use the disk. Checkpoints become stale and
unusable as you run out of disk.

Before any major operation, take a cold Oracle checkpoint. This takes
about 2 minutes. You could do a hot one, but the restore is far more
complex, and since the data warehousing instance it not archivelogmode,
hot backups (even via checkpoint) are chancy.

It takes 5 hours to backup 345GB to EZ17 via mounted checkpoints. You
can do this while the base it up and active.

It takes 5 hours to restore 345GB from EZ17. To make it easier to restore
it is better to have a single large partition to bring it back into,
rather than deal with multiples and the associated links.

It takes 4 hours of copying from a mounted checkpoint via disk to restore
those same files. You can mount writable checkpoints to test program changes,
without fear of damaging your original base. The docs say you can
automatically generate a test instance with a new SID, but I haven't tested
that yet.

It takes 10 seconds of rollback via checkpoint to restore those same
files.
New Well... Oracle, RMAN and TSM, know them and love them...
Okies here goes:

Aw Crap I just don't care any more... Sorry Barry.

Just look it up... has less problems(I believe) than I've ever heard or seen with Veritas.

RMAN is called Oracle Data-Protection for TSM.

It can do nearly anything you want to do with any table space live... if you use another file system instead of allocating a tape channel, you can just keep it going forever... providing you can just keep adding disk for it. We Opted for TSM with DRM (Disaster Recovery Manager) so we can get copies of all the Tapes we use, Offsite. Therefore we are able to do "Checkpoints" anytime anywhere we want. RMAN will restore the DB to any checkpoint the DBA chooses until he "expires" it. Quite an involved process if you want to ONLY keep 1 years worth of transactions and archives for Oracle.

It does work... we have even been able to restore a corrupted tablespace while disabling all access to it while we restored a good versoin of it. A malformed/mistyped "peoplesoft upgrade script" that did an Extend, then added columns, then moved data from one column to another, multiple times, then deleted the old Columns. Well, it actually did everything right till it did the delete. It actually deleted the New Columns... and since it was one of those seldom used "until now" tables it didna make much difference until New Versions of SQR and SQL and such started reporting against it. It is "static table" so it worked to restore an old version and then run the script without the delete.

So you may see use for this. If you do, OK, if not, OK.

Enterprise Tape Storage is STILL significantly cheaper than Enterprise Disk Storage... So we opted for the fastest Tape Drives we could find... and got multiple... Getting about 250GB an hour when we force TSM to use multiple tape drives and SCSI paths to those Drives.

P.S. I was gonna go into how we have tested it in greater detail but... I just well don't wanna. If you want more info ping me.

greg, curley95@attbi.com -- REMEMBER ED CURRY!!!
New Impressive tape driving timing
What are they, how many, driven by what equipment?
New Machine and types of Equipment
AIX 4.3.3 Fixpack 7
Oracle v8.1.7.3
TSM for AIX v4.2.1
RMAN (v??) for Oracle on AIX

RS6000 - S7A
8 - 233MHz Risc Mode PPC Processors
32GB Memory
8 - Ultra-Wide SCSI Channels
4 - 3590K Tape Drives
    2 - UW-SCSI interfaces per drive for multiple SCSI paths
2 - 3494 Tape Library Cabinets
Multiple GB Ethernet trunked (varies on how many)

Also this machine is our Primary Oracle Machine.

greg, curley95@attbi.com -- REMEMBER ED CURRY!!!
New Re: Machine and types of Equipment
Holy shit.

Hmm, rough guess, hhmmmm.

$400K in equipment.

Where's the disk?
New Re: Machine and types of Equipment
The disk is in an IBM ESS. Multiple fiber SCSI Paths using vpaths.

$93K for the contoller cabinet with 2 Drives, Media and SCSI Cables for them.
$82K for another storage cabinet with 2 Drives, Media and SCSI Cables for them.
The SCSI adapters came with the drives.

Had to have a second backplane for the S7A anyway, so we put 4 SCSI in one backplane and 4 in the other. Put 2 Fiber SCSI cards in each backplane also to connect the ESS.

TSM upgrade from ADSM... Dunno.

ESS with 540GB of disk, $330K with straight UWSCSI connections
ESS upgrade to Fiber SCSI connect when they finally became real $55K includes adapters
ESS SSA Disk upgrade for another 1080GB $85K

The S7A is actually a big box to have the stuff running on... but it was available and it has never had to breathe real hard.

But, we can't count all this money for just the backup solution. Sure some of it can but not all of it.

greg, curley95@attbi.com -- REMEMBER ED CURRY!!!
     Recovering Oracle (and other files) via Veritas checkpoints - (broomberg) - (5)
         Well... Oracle, RMAN and TSM, know them and love them... - (folkert) - (4)
             Impressive tape driving timing - (broomberg) - (3)
                 Machine and types of Equipment - (folkert) - (2)
                     Re: Machine and types of Equipment - (broomberg) - (1)
                         Re: Machine and types of Equipment - (folkert)

LRPD in a coma, I know, I know... it's serious.
79 ms