Disaster Recovery Plan
Disaster Recovery and Business Impact Analysis
Disaster Recovery is an essential part of Business Continuity Planning. Both concepts aim to safeguard an organization's ability to maintain critical operations during and after disruptive events, such as natural disasters, hardware failures, or cyberattacks.
Disaster Recovery specifically focuses on the restoration of critical systems and data following a significant disruption. Disaster Recovery involves the identification of mission-critical systems, establishing recovery priorities, and implementing strategies to restore those systems and data within an acceptable timeframe. Key metrics in disaster recovery planning include Recovery Time Objective and Recovery Point Objective. Recovery Time Objective is the targeted amount of time within which systems should be restored, while Recovery Point Objective represents the maximum acceptable data loss in terms of time.
Business Continuity Planning, on the other hand, encompasses a broader set of activities designed to ensure that an organization can maintain its essential functions during a disruption and recover to a normal operational state as soon as possible. Business Continuity Plan includes disaster recovery but also covers other aspects, such as workforce management, communication plans, alternate site arrangements, and supply chain management.
Disaster Recovery is a critical component of Business Continuity Planning. While Disaster Recovery focuses on restoring IT systems and data, Business Continuity Planning addresses the larger organizational processes and functions that must continue or be resumed after a disruption. Both are essential for maintaining an organization's resilience in the face of unexpected events and ensuring the continuity of essential operations.
Events that can trigger Disaster Recovery activities include but are not limited to:
Hardware failure: This could involve the failure of critical components like hard drives, servers, or network devices, which could lead to data loss or service disruptions.
Software corruption: Data or system files can become corrupted due to various reasons, such as bugs, system crashes, or power outages, rendering them unusable.
Software failure: Bugs, compatibility issues, or unanticipated interactions between applications can lead to system crashes or data corruption.
Bad patches or updates: Occasionally, software updates or patches may introduce new issues, cause incompatibilities, or render systems unstable. This may require a rollback or restoration from a previous backup to ensure system stability and functionality.
Human error: Accidental deletion of critical files or system misconfigurations by IT staff or other employees can result in the need for disaster recovery activities.
Power outages: Sudden loss of power can cause system crashes, data corruption, or hardware damage, requiring disaster recovery efforts to restore normal operations.
Natural disasters: Events such as floods, earthquakes, or fires can damage IT infrastructure, leading to the need for disaster recovery actions.
Cybersecurity incidents: Cyber attacks such as ransomware, Distributed Denial of Service (DDoS) attacks, or targeted intrusions can cause significant damage to an organization's IT infrastructure and data, warranting disaster recovery measures.
Supply chain disruptions: An interruption in the availability of critical resources or services, such as cloud storage providers or third-party software support.
Regulatory or legal requirements: Organizations may be required to initiate disaster recovery processes to comply with specific regulations or legal mandates, such as preserving evidence for investigations or responding to data breaches.
Disaster Recovery Stages
Organizations depend on mission/business-critical information systems to conduct daily operations, deliver products and services to customers, and communicate with suppliers, partners, and service providers. These systems can face failures, outages, disasters, or emergencies, which may lead to downtime. Downtime can impact the confidentiality, integrity, and availability of these systems and their data.
There are four general stages involved in Disaster Recovery, they include:
Business as usual when the information system is operating normally.
Disaster occurs when an event causes downtime, data corruption, etc.
System recovery includes recovery efforts to restore the system or its data.
Resume production refers to the amount of time to verify the restored system and data are reliable to be used in production.
Policy Statement
Keuka College takes disaster recovery planning efforts very seriously and is committed to allocating appropriate resources. Keuka College Disaster Recovery efforts aim to ensure the safety of people and to reduce the impact disruptions may have on business operations.
The following activities support the policy statement:
- A formal risk assessment must be undertaken to determine the requirements for the Disaster Recovery Plan.
- The disaster recovery plan must cover all essential and critical infrastructure elements, systems and networks, following key business activities.
- The disaster recovery plan must establish emergency-level service requirements needed to have students in school safely.
- All employees must be aware of the plan and their respective roles.
- The plan must be periodically tested via checklists, walkthroughs, simulations, etc., to ensure it remains actionable and valid to address changing risks and circumstances.
- The disaster recovery plan must be maintained.
Disaster Recovery Process
The principal objective of the Disaster Recovery Program is to develop, test, and document a well-structured and easily understood Disaster Recovery Plan. This plan aims to help Keuka College recover quickly and effectively from a disaster or emergency that interrupts information systems and business operations. Formally defined by NIST as Information System Contingency Plans (ISCPs). They provide procedures and capabilities for recovering each system that needs recovering. An ISCP is an information system-focused plan that may be activated independently from other plans or as part of a larger recovery effort coordinated with an Incident Response Plan.
- Detect and Analyze – Confirm the event has occurred and determine the scope of the affected systems.
- Assemble the Incident Response Team – Ensure all members of the Incident Response Team understand their roles. Determine the category of the event and notify executive leadership.
- Incident Reporting – Report the known facts of the event to insurance, counsel, and law enforcement.
- Business Impact Analysis - Identify mission critical systems and develop a restoration plan to be implemented immediately.
- Identify Resource Requirements – Determine the resources required to restore critical services and communicate needs to executive leadership.
- Communicate Required Information – The Communication Director will develop a communication plan for impacted parties.
- Financial Assessment – The Vice President overseeing the Information Technology Department and Vice President of Finance will create a financial assessment of the impact of the event including business interruption losses, finances needed to restore critical systems, and impact on the College’s cash position.
- Legal Strategy – The Vice President overseeing the Information Technology Department and the President will determine the legal strategy and risk assessment to mitigate the event.