Disaster Recovery: Why DR is important and how to build it

Many people confuse the terms Disaster Recovery and Business Continuity and whilst it’s true in both cases you’re recovering from a disaster, they’re not the same.

In accordance to ISO standards, Disaster Recovery (DR) ensures the recovery of IT systems, it is data-centric, while business continuity ensures the recovery of the business in case of a disaster, or disruptive incident.

In the modern business landscape, both terms are often intertwined, specifically with digital services.

As an example, Netflix relies heavily on servers and other cloud services to provide an all year round streaming. However, that’s not the only aspect of a “disaster”; business continuity is to ensure in one of its many aspects that the people are safe, and another location is available for staff to work from in case the main office turned to ashes.

Imagine a retail store with their entire POS systems are down, DB server down and customers unable to make purchases. Do staff members tell customers they can’t buy the items or should the business have a process of using pen and paper instead? Record every purchase, price, and then enter them into the systems when they’re back on?

Disaster recovery only applies if their IT systems were down; it would be considered a business continuity if the head office is on fire. Our focus is to outline what makes Disaster Recovery unique from business continuity.

 

What is Disaster Recovery?

Disaster Recovery is the recovery of IT systems (hardware, virtual, applications, DBs and so on), and it’s not all about Recovery Point Objective (RPO), and Recovery Time Objective (RTO), or defining business critical applications.

Although RPO, RTO, and business critical applications are quite important, what’s even more important is a Disaster Recovery Plan (DRP).

Before going into the DRP and strategy however, let us first define RPO and RTO in a little bit more details.

  • Recovery Point Objective (RPO) defines the continuity of the services in a defined time, that is, how much data the business can afford to lose to a certain point, before it can recover from a disaster. In other words, how far back in time can a business go, to be able to resume its work? Take for example a Database server, can a business afford losing 24 hours of data, 8 hours of data, 2 hours of data? This is where an RPO gets defined.
  • Recovery Time Objective (RTO) defines the amount of time a business can operate without its IT systems being available. Can a DB server, or an Exchange server be down for 2 hours before the panic button is hit? This is where we determine how soon a system should be back online and running again.
  • Business critical applications will need to be defined to determine the priority of systems that will need to be recovered. Should a business recover its printing server first or the Domain Controller? Or should they recover their SCCM server or SQL server?

Many organisations think that’s the holy trinity of Disaster Recovery, but it isn’t.

 

Disaster Recovery Plan and Strategy

Equally important to the RPO, RTO and applications, is having a Disaster Recovery Plan and a strategy.

The important bit of any plan, is a drill. Every now and then, the IT department should have a drill of its Disaster Recovery to execute a plan and later the strategy or process. This is important for many reasons.

For example, imagine you’re recovering a VM from a certain date, or even you’re recovering an entire DB of a given date, but that backup is actually corrupt. You need to ensure:

  • Enterprise backups are accessible and healthy
  • Staff are assigned to handle backups when necessary
  • Staff responsible for backups actually know and have a pre-defined process a plan to follow

You don’t want to execute a wrong plan, otherwise not only will the business suffer for a longer period of time, but also you may hit the wrong RPO and RTO, and eventually be in a non-recovery situation.

 

Determine a DR Plan

dr_why_you_need_a_plan

 

A DR plan has multiple elements to it, to name a few:

  • Role assignment: No business or department should rely on one person to do it all. And since Disaster Recovery is the recovery of IT systems, each person within the team should have a role assigned to them. Note however, in a business continuity roles are quite different. But back to DR, you couldn’t expect a business analyst to perform a restore of files, applications or VMs (unless trained).
  • Inventory: You will need to determine what the business requires while performing a recovery. And this is part of the DR plan (we will discuss the plan later in the post). There are always physical devices in the IT Server Room (routers, switches, hypervisors, etc.). Defining your inventory for each scenario would help you determine what you need to recover. Recovering a VM is entirely different than from your hypervisor catching fire, which will mean getting a new server and install the hypervisor, delivery of the server alone can take time.
  • Backup/Restore check: Performing backup and restore check every now and then is important to make sure that the data that is backed up is useable, otherwise primarily you will miss the RPO. We wouldn’t imagine any business can survive with a 2TB loss of data.

There are other elements to this of course, such as vendor communication, having updated documentation of the environment and so on, but these three are the most important.

 

Disaster Recovery Plan and Strategy

The more effective your plan and strategy or process on how to respond to a disaster recovery, with previous exercises, the better you will recover. There’s a simple yet very effective recovery plan for DR that organisations usually follow, outlining:

  • Critical System: Define your critical systems so you know how to respond to each.
  • RTO/RPO (hours): Define both RPO and RTO in hours, or minutes.
  • Threat: Define the threat, e.g. AD object deletion.
  • Prevention Strategy: An example would be protect accidental deletion of objects.
  • Response Strategy: Restore deleted object using restore solution.
  • Recovery Strategy: How approach the recovery, who needs to be involved etc.

That plan is transformed to a process or plan strategy on how to respond. This basically maps out the steps that will need to be taken in case of a disaster, and the following should be outlined:

  • Critical System: Your defined critical systems.
  • Threat: What was/is the threat.
  • Response Strategy: The defined response strategy.
  • Response Action Steps: Document step by step how to respond.
  • Recovery Strategy: The defined recovery strategy.
  • Recovery Strategy steps: Map out step by step the recovery strategy and future prevention.

 

High Availability is not DR

There’s a misconception about high availability and disaster recovery; they’re not related at all.

If you take a SQL Server Enterprise 2012 and onward, with SQL AlwaysOn Availability Group enabled, this would mean that the DB is available on two different SQL server, your application is configured to talk with the “listener” that’s configured in your SQL server. This means if SQL A goes down, your application will continue operating, as SQL B is still online and the secondary DB being active on SQL B.

However, if an object is deleted on the primary DB, that deletion will replicate across the secondary DB that is available on SQL B.

Therefore, you will still need to recover that object from a previous backup.

 

How cloud-based DR solutions help

Disaster Recovery as a Service (DRaaS) helps businesses recover quickly from a disaster. Cloud services and solutions are redundant, therefore backups that are backed up to a location or storage in the cloud of choice, is replicated to multiple locations, think of it like disk mirroring.

Unlike traditional backup solutions, it’s better to use cloud backups, we could either help you adopt new backup solutions, or use your backup solution to take advantage of the cloud. Depending on the backup solution your company uses, we can help leverage cloud backups in multiple ways.

Traditional on premise backups and tape backups are not good enough, for many reasons. For example, if your IT service room is on fire, or your storage solution failed, the amount of time and cost that will take to recover could be catastrophic to the company. Whilst utilising cloud solutions, systems are virtually instantly available for you to restore VMs, objects or applications.

 

Better disaster recovery: Next steps

Whether your cloud of choice is Azure or AWS, Xello can help you utilise both cloud vendors for a complete, end-to-end backup solutions and a disaster recovery plan. We could also help you leverage your own backup solution, without the need for additional licenses. 

To brush up on best practices and identify the best places to start, download our free Azure Data Protection White Paper for more help to guide your DR transformation.