EDR / MDRIdentify, contain, respond, and stop malicious activity on endpoints
SIEMCentralize threat visibility and analysis, backed by cutting-edge threat intelligence
Risk Assessment & Vulnerability ManagementIdentify unknown cyber risks and routinely scan for vulnerabilities
Identity ManagementSecure and streamline client access to devices and applications with strong authentication and SSO
Cloud App SecurityMonitor and manage security risk for SaaS apps
SASEZero trust secure access for users, locations, and devices
SOC ServicesProvide 24/7 threat monitoring and response backed by ConnectWise SOC experts
Policy ManagementCreate, deploy, and manage client security policies and profiles
Incident Response ServiceOn-tap cyber experts to address critical security incidents
Cybersecurity GlossaryGuide to the most common, important terms in the industry
How to test a BCDR plan
You already know the importance of having a robust disaster recovery plan for your customers—and your own business. But that’s just the first part of preparing for a disruptive event. A comprehensive BCDR plan must also include testing that covers three areas: people, processes, and technology. You have to determine that:
- The technical infrastructure can handle the demands of the plan.
- Employees have the information and tools they need to carry out their responsibilities in the BCDR plan.
- The procedures and protocols in place work in practice as well as in theory.
Testing is the final part of designing and implementing BCDR plans that work the way they are supposed to. This post provides comprehensive guidance on BCDR testing to ensure those plans will function as required when needed.
Putting detailed business continuity disaster recovery (BCDR) plans in place for your customers is one of an MSP’s most critical functions. If a client’s organization does face a disruptive event, you need to make sure it—and you—are ready. Watch our on-demand webinar, BDR + NOC: Backup Your Data Better, to learn more about the different solutions that will ensure you have all the necessary bases covered.
If you already have those plans in place—great! But have you tested them recently?
BCDR plans don’t fall under the “set it and forget it” category. Threats evolve, technologies change, and unexpected issues arise. Even the most detailed and thorough plan can look flawless in theory, but in practice, you may uncover some serious issues that could lead to potentially catastrophic data loss or downtime. The most careful planning is pointless without regular and rigorous testing.
BCDR testing involves running exercises and simulations to ensure there are no gaps, vulnerabilities, or unforeseen issues with a BCDR plan. Key aspects generally include:
- Defining and designing specific scenarios that align with possible real-world threats (such as cyberattacks or natural disasters)
- A detailed testing plan that outlines objectives, scope, methodologies, personnel, timelines, logistics, and criteria for success
- Post-testing assessment and analysis to identify areas of improvement and lessons learned
It’s also essential to also assess communication and coordination processes (such as notifications, employee responsibilities, and escalation procedures) at every step, as these are critical to the success of the plan. You should ensure that organizational stakeholders understand their roles and responsibilities as well as where and how to share and get information during a crisis (for example, by using instant messaging if the business’s email system is inaccessible).
The consequences of not engaging in BCDR plan testing can be severe for both you and your clients, including:
- Lost data
- Loss of professional reputation and credibility
- Significant financial costs
The loss of customer trust can be catastrophic. Current clients may decide to work with another provider, and the damage to your reputation could scare off potential customers. If you are serious about ensuring your customers can survive a disaster, cyberattack, or any other incident, you must include testing as a consistent element of BCDR planning and readiness. By doing so, you also help develop their customers’ resilience against evolving threats and cultivate professional credibility.
Goals are beneficial for providing a clear direction for BCDR testing, including making sure tests align with overall business goals. In particular, you should establish Recovery Point Objectives (RPOs), which refer to the amount of data that is acceptable to lose before restoration, and Recovery Time Objectives (RTOs), the amount of time before services are restored.
Additional goals for disaster recovery and business continuity plan tests can relate to:
- The integrity and availability of the recovered data
- The functionality and performance of recovered systems and applications
- Feedback from personnel and other users of the recovered systems
- Comparison with results and outcomes from previous BCDR tests
Again, these objectives will vary for each customer. You should help define goals and other desired results by working with key stakeholders from executive leadership, IT teams, and departmental managers; considering budget and resources; and emphasizing continuous improvement.
There are several different types of BCDR testing, each of which offers pros and cons. The business continuity and disaster recovery test types that are appropriate for an organization will depend on a variety of factors, including its size and nature, available resources, and the stage of BCDR testing taking place.
These involve real-time discussions with organizational leaders and anyone else with a critical role in the BCDR plan. The group examines the plan, explores different scenarios, and ensures that all business units are accounted for.
- Pros: This method requires limited resources, offers an opportunity to ask questions and add to knowledge, and supports cross-departmental communication and coordination.
- Cons: Since the test is “on paper,” there is no chance to validate technical aspects or see how it plays out in practice.
This type of testing is best suited for the beginning stages of the process. Tabletop exercises can also be an effective training tool.
In walk-through BCDR testing, the team is faced with a specific type of disruptive event, and each member goes through their individual roles and responsibilities to identify any gaps or inefficiencies.
- Pros: Walk-throughs provide the opportunity to do a comprehensive evaluation of an entire plan to find bottlenecks or other inefficiencies. Team members can also share expertise and gain an overview of the entire BCDR process.
- Cons: Like tabletop exercises, walk-throughs do not provide technical or practical validation.
This is another type of test that is most appropriate for the preliminary stages of the testing process.
This test checks if failover systems — backup modes that go into action when a primary system fails — can handle required business operations, processes, and applications after a disrupting event.
- Pros: By providing validation of data integrity and security, this test is more realistic than purely theoretical options.
- Cons: Parallel tests can be complex, time-consuming, and resource intensive. There also may be a risk to the production environment.
To reduce the risk of wasted time and resources, parallel tests should be undertaken only when teams have successfully addressed all gaps and issues with tabletop exercises and walk-throughs
Unlike the parallel test, in a cutover test, the failover systems are completely disconnected from the primary systems to take on the full load of business operations. It is the closest possible simulation of an actual disaster event.
- Pros: This test is highly realistic and provides comprehensive insights into the readiness and sufficiency of the BCDR plan as well as potential gaps.
- Cons: Because primary systems must be offline, this test can be highly complex and can be difficult to schedule.
Because cutover tests require critical systems to be disconnected, these tests should be conducted in the final phase of the BCDR testing process.
In addition to various types of tests, a comprehensive BCDR testing strategy checks systems at different levels of depth to ensure all aspects function as expected.
- Data verification shows whether the BCDR plan made consistent and accurate backups of original data files and that the data is recoverable. Validating the data integrity provides confidence that it can be restored in the event of an incident. However, it often requires validating data across different systems, databases, or physical locations, which can be complicated. In addition, certain validation techniques may miss subtle variations, especially in complex files.
- Database mounting ensures that a database backup can read data and perform other basic functions. It offers realistic testing in an environment that resembles an actual recovery scenario and also enables testing of applications that rely on the database. However, mounting the backup may affect primary systems, and ensuring that data is consistent between the backup and original database can be difficult. Database mounting often goes hand-in-hand with data verification.
- Single machine boot verification tests whether a server can be rebooted after going down. It allows MSPs to test individual systems or machines, enabling them to isolate issues in the recovery process. It can also be performed relatively quickly, making it easy to incorporate into testing. However, this level only tests the server, not the applications or data on it, and focuses only on the booting process.
- Runbook testing checks the functionality and efficiency of step-by-step procedures for different recovery processes. It exposes any weaknesses or vulnerabilities and offers an opportunity for team members with BCDR responsibilities to get familiar with the process. It also serves as evidence that the organization is complying with audit rules, industry regulations, and other requirements. However, because runbook testing occurs in a controlled environment and time period, it may not incorporate the complexities and pressures of an actual event. Runbook testing is best suited for highly detailed and comprehensive BCDR plans.
- Recovery assurance is the most advanced level of BCDR testing, involving many components of hardware and applications, the assessment of service level agreements, and diagnostics to evaluate the successful recovery of critical systems, applications, and data. It provides the highest level of confidence that BCDR plans will be successful, but its complexity also requires significant time, personnel, and infrastructure to complete. However, for businesses that provide critical services, it is essential.
When advising customers on the levels of disaster recovery and business continuity plan testing they need, keep these factors in mind:
- The business requirements of the organization (such as the priority of critical systems and tolerance for downtime)
- The risks and potential impact of disruptive events on the business
- Budget and resource constraints
- Compliance and regulatory requirements
- The customer’s plans for growth and expansion
You should also take time to educate customers on the different levels of BCDR plan testing to help them understand which ones are most appropriate for their needs and capabilities.
Because they require less infrastructure and fewer employees, theoretical tests like tabletop exercises and walk-throughs should be undertaken several times a year. More comprehensive and advanced tests that require significant resources and time, such as parallel and cutover testing, should be done at least annually.
However, the schedule will also depend on several factors, such as:
- The size, nature, and industry of the business
- The complexity of its data and networks
- Its security and vulnerability profile
- Whether it must meet specific compliance and auditing regulations
Failing to test often enough can have consequences, ranging from annoying and expensive to disastrous. These include a lack of preparedness, compliance risks, fines, and permanent data loss.
When working with customers to design a BCDR testing strategy and schedule, you should aim to align the timing with business cycles, any business updates, and regular maintenance periods to reduce the disruption to normal operations. The agreed-upon schedule should be communicated to all employees who will be affected, particularly those who will be needed for the testing process.
BCDR testing is a critical but often-overlooked aspect of planning for business continuity and disaster recovery. Creating and executing a testing plan can be a time-consuming and complicated process, which is why many businesses fail to do it. MSPs can help by:
- Working with their customers to identify their testing needs, including types and levels of tests
- Defining goals for testing
- Scheduling testing on a consistent basis
- Documenting and communicating outcomes
- Incorporating findings into the BCDR plan for continuous improvement
In addition to discovering that processes do not include essential steps or employees do not know what their responsibilities are, businesses that don’t test their plans regularly may find backups have been corrupted or are otherwise unusable.
BCDR solutions from ConnectWise help MSPs provide clients with secure, automated, and reliable data recovery—a key element of BCDR planning and testing. Start your free BCDR demo today to take the next step toward improving your disaster recovery service offering. Also, ConnectWise Co-Managed Backup includes regular disaster recovery testing on behalf of ConnectWise MSP partners and their clients.