and Disaster Recovery
and Disaster Recovery
The diagram below shows the architecture of a proposed disaster recovery solution. At the production site, DR Prophet® creates a thinly provisioned mirror volume (“SAN Mirror”) for each protected disk using a local storage server. An application-aware agent running on the application server ensures the consistency and integrity of the mirrored data. All subsequent backup or disaster recovery related activities such as snapshots or DR rehearsal are performed against the SAN Mirror to avoid any performance impact on the production system.
DR Prophet® also creates a block-level remote replica (“Remote Replica”) of the SAN Mirror at the DR site. All IP traffic transmitted over the Internet is fully WAN-optimized. The DR Prophet® instance run at the DR site takes snapshots and performs DR rehearsal against the Remote Replica entirely independent of the production site.
This architecture also includes a complete set of standby servers that mimic the exact configurations at the production site. In case of a catastrophic failure of the production servers, business operation can be quickly resumed at the DR site using these standby servers and the data on the Remote Replica.
With traditional tape-based systems, backups are typically performed once at the end of each business day to avoid slowing down the production system due to resource contention. In addition, once started, backing up data on tapes takes hours to complete, which also makes tape backup during business hours infeasible.
Unlike tape systems, DR Prophet® is based on a near-real-time architecture. Through the use of an agent on each protected application server, all disk operations are duplicated and sent over LAN to the SAN Mirror at the production site and over WAN to the Remote Replica at the DR site. Essentially the production disk is mirrored both locally (SAN Mirror) and remotely (Remote Replica). Based on a configurable schedule, periodic snapshots are taken to preserve the point-in-time image of the mirrored disk, which allows the disk to be rolled back to a previous known-good state in case of a virus attack or file corruption. Disk space allowing, DR Prophet® can take as many as 1,024 snapshots to give the system administrator a very granular control over the available recovery points. Since snapshot operations are performed on the mirrored disk, they can be taken as frequently as necessary to improve RPO without affecting the normal operation of the production system
Snapshots taken randomly without coordinating with the applications will often leave common business applications like Oracle or Microsoft SQL Server in a crash consistent state in the snapshot image. DR Prophet® agent actively communicates with the application and puts it in a quiescent state, an online backup state, before a snapshot is taken. This essentially ensures that the state of the database remains consistent in the snapshot and the point-in-time image can be used immediately to resume the application without having to go through the tedious and time consuming steps such as verification or replaying database log files.
To recover from a non-catastrophic minor failure such as file corruption or accidental deletion of a folder, simply select a known-good point-in-time snapshot and mount it on the application server and it will appear as a new disk with all the data the application can use immediately.To recover from a non-catastrophic minor failure such as file corruption or accidental deletion of a folder, simply select a known-good point-in-time snapshot and mount it on the application server and it will appear as a new disk with all the data the application can use immediately.
For catastrophic server or hard disk failures, instead of having to wait for the hardware vendor to bring the failed hardware back online, a virtualized copy of the server can be spun up on the standby virtualization server to rapidly resume the service using the SAN Mirror. When the failed hardware is finally replaced, data can be transferred back from the SAN Mirror.
|Failure Scenario||DR Prophet® Action||Local Recovery||RTO|
|Loss/corrupted files or folders||Validation||Yes||10 minutes|
|Database corruption||Mount last known-good snapshot||Yes||20 minutes|
|OS/Application crash||Full system recovery including boot driver or P2V/V2V on standby VM||Yes||15 minutes|
|Failed storage hardware||Full data mirror available (LVM*)||Yes||0 minutes|
|Site loss||Launch DR site into production within 1 Hour||No||1~2 hours|