Recovering Oracle (and other files) via Veritas checkpoints

Post #37,541 by broomberg 5/4/02 11:50:39 PM Reply	Recovering Oracle (and other files) via Veritas checkpoints Here's a little report that I whipped up that may interest some people here. I may flesh it out and present it at Sage. ----------------------------------------- Oracle restore via checkpoints I attempted to do a single tablespace checkpoint rollback. It was BAD. I ended up trashing my base. The good: I got my 330G Oracle base back without resorting to tape. The bad: It took about 4 hours to copy the files around when it should have taken 10 seconds of rolling back the checkpoints. The ugly: This notice is part of the 1st paragraph of the the Veritas vxckptadm man page, which is needed to rollback from a checkpoint. DESCRIPTION The vxckptadm utility is not an end-user supported utility and should not be run by users. The VxDBA utility inter- faces to this command, allowing management of Storage Check- points. If the utility WORKED, I wouldn't need to deal with this command. There was a bug in the initial Veritas checkpoint save script that put bad entries in the Veritas list of files to restore, which then caused the restore step to abort. The stupid: If I had spent a few more minutes rereading and testing, I could have avoided the copy time. After finishing the "copy restore", I reread the man page, checked the failure log, and realised that the rollback needed file names, and would not work on directories or file systems. So I then rolled it all back via a large list of file names, and it worked fine. On the other hand, this is why we test. It seems that all the user level checkpoint tools deal with the creation and mounting of checkpoint, not rolling back. fsckptadm is the command for most of the work, but it has no way of rolling back. The vxckptadm talks about rolling back Oracle instances (which failed), and individual files, but NOT the entire file system, which is what I wanted, but could not have. Lessons learned: Never trust Veritas stuff without testing. We always knew that, but it is worth repeating. In this case they use 'sqlplus' for some system setup, but will get confused with the results if things such as "set timing on" are in the glogin.sql file. Keep a percentage of all file systems empty to be used for checkpoint data. They work great, but you need to know what you are doing. The amount of free space required is based on amount of data being changed. Checkpoints survive system reboot, so you can keep a LONG history around if you want to use the disk. Checkpoints become stale and unusable as you run out of disk. Before any major operation, take a cold Oracle checkpoint. This takes about 2 minutes. You could do a hot one, but the restore is far more complex, and since the data warehousing instance it not archivelogmode, hot backups (even via checkpoint) are chancy. It takes 5 hours to backup 345GB to EZ17 via mounted checkpoints. You can do this while the base it up and active. It takes 5 hours to restore 345GB from EZ17. To make it easier to restore it is better to have a single large partition to bring it back into, rather than deal with multiples and the associated links. It takes 4 hours of copying from a mounted checkpoint via disk to restore those same files. You can mount writable checkpoints to test program changes, without fear of damaging your original base. The docs say you can automatically generate a test instance with a new SID, but I haven't tested that yet. It takes 10 seconds of rollback via checkpoint to restore those same files.
Post #37,545 by folkert 5/5/02 1:30:58 AM Reply	Well... Oracle, RMAN and TSM, know them and love them... Okies here goes: Aw Crap I just don't care any more... Sorry Barry. Just look it up... has less problems(I believe) than I've ever heard or seen with Veritas. RMAN is called Oracle Data-Protection for TSM. It can do nearly anything you want to do with any table space live... if you use another file system instead of allocating a tape channel, you can just keep it going forever... providing you can just keep adding disk for it. We Opted for TSM with DRM (Disaster Recovery Manager) so we can get copies of all the Tapes we use, Offsite. Therefore we are able to do "Checkpoints" anytime anywhere we want. RMAN will restore the DB to any checkpoint the DBA chooses until he "expires" it. Quite an involved process if you want to ONLY keep 1 years worth of transactions and archives for Oracle. It does work... we have even been able to restore a corrupted tablespace while disabling all access to it while we restored a good versoin of it. A malformed/mistyped "peoplesoft upgrade script" that did an Extend, then added columns, then moved data from one column to another, multiple times, then deleted the old Columns. Well, it actually did everything right till it did the delete. It actually deleted the New Columns... and since it was one of those seldom used "until now" tables it didna make much difference until New Versions of SQR and SQL and such started reporting against it. It is "static table" so it worked to restore an old version and then run the script without the delete. So you may see use for this. If you do, OK, if not, OK. Enterprise Tape Storage is STILL significantly cheaper than Enterprise Disk Storage... So we opted for the fastest Tape Drives we could find... and got multiple... Getting about 250GB an hour when we force TSM to use multiple tape drives and SCSI paths to those Drives. P.S. I was gonna go into how we have tested it in greater detail but... I just well don't wanna. If you want more info ping me. `greg, curley95@attbi.com -- REMEMBER ED CURRY!!!`
Post #37,572 by broomberg 5/5/02 2:46:33 PM Reply	Impressive tape driving timing What are they, how many, driven by what equipment?
Post #37,583 by folkert 5/5/02 6:14:06 PM Reply	Machine and types of Equipment AIX 4.3.3 Fixpack 7 Oracle v8.1.7.3 TSM for AIX v4.2.1 RMAN (v??) for Oracle on AIX RS6000 - S7A 8 - 233MHz Risc Mode PPC Processors 32GB Memory 8 - Ultra-Wide SCSI Channels 4 - 3590K Tape Drives 2 - UW-SCSI interfaces per drive for multiple SCSI paths 2 - 3494 Tape Library Cabinets Multiple GB Ethernet trunked (varies on how many) Also this machine is our Primary Oracle Machine. `greg, curley95@attbi.com -- REMEMBER ED CURRY!!!`
Post #37,599 by broomberg 5/5/02 10:38:30 PM Reply	Re: Machine and types of Equipment Holy shit. Hmm, rough guess, hhmmmm. $400K in equipment. Where's the disk?
Post #37,667 by folkert 5/6/02 4:55:00 PM Reply	Re: Machine and types of Equipment The disk is in an IBM ESS. Multiple fiber SCSI Paths using vpaths. $93K for the contoller cabinet with 2 Drives, Media and SCSI Cables for them. $82K for another storage cabinet with 2 Drives, Media and SCSI Cables for them. The SCSI adapters came with the drives. Had to have a second backplane for the S7A anyway, so we put 4 SCSI in one backplane and 4 in the other. Put 2 Fiber SCSI cards in each backplane also to connect the ESS. TSM upgrade from ADSM... Dunno. ESS with 540GB of disk, $330K with straight UWSCSI connections ESS upgrade to Fiber SCSI connect when they finally became real $55K includes adapters ESS SSA Disk upgrade for another 1080GB $85K The S7A is actually a big box to have the stuff running on... but it was available and it has never had to breathe real hard. But, we can't count all this money for just the backup solution. Sure some of it can but not all of it. `greg, curley95@attbi.com -- REMEMBER ED CURRY!!!`

Welcome to IWETHEY!