LACTLD interviewed four experts on contingency plans for ccTLDs: Sebastián Castro, Chief Scientist at InternetNZ (the registry for .NZ domains and the operator of the .NZ domain name space); Dave Baker, Chief Technology Officer (CTO) at InternetNZ; Pablo Rodríguez, Executive Vice President of Puerto Rico Top Level Domain; and Julio Cossío, Information Systems Director at NIC Mexico.
Each of them provided their viewpoints on the following questions: What are the threats that put ccTLD operation at risk? How can incidents, disasters and emergencies be averted and reduced? How should operations be reestablished after these events? Who should be in charge of such a task? Why is it important for regional ccTLDs to develop an emergency response plan?
In addition, Pablo Rodríguez (.PR) shared the experience his ccTLD had when two hurricanes hit Puerto Rico.
Natural Disasters in the Region: Irma and María
In 2017, Puerto Rico went through two category-five hurricanes in a two-week period. Pablo Rodríguez (.PR) tells us:
“On September 7, 2017, hurricane Irma went right by Puerto Rican coasts, knocking out the power for a million people and flooding the territory due to heavy rain. Thirteen days later, hurricane María hit Puerto Rico causing 9.161 million dollars in damage, completely destroying the power grid and causing the death of nearly three thousand people.”
In spite of these devastating consequences, the .PR DNS services were operational before, during and after the disaster. As Pablo Rodríguez (.PR) points out:
“Two of the measures that ensured the operation of the DNS were the use of hosting services in storm- and earthquake-proof facilities, and the use of Anycast services. The conditions and factors that cause the vulnerability of a ccTLD are many, and they change from one ccTLD to another. All ccTLDs must identify the factors that cause these risks in order to take measures to mitigate them and ensure the resilience of their operation in the event of a disaster.”
Disaster Recovery Plan and Business Continuity Plan
According to our interviewees, a Disaster Recovery Plan provides the procedures and priorities to focus on when a categorized disaster affects the organization. It is a series of scheduled, documented and proven procedures that are established in order for us to be prepared to face external conditions that could jeopardize the business.
Sebastián Castro and Dave Baker (.NZ) hold: “A disaster can be a natural hazard, like an earthquake, a tsunami, a fire in one of the facilities, a truck hitting a datacenter, etc. The plan focuses on restoring critical business functions as soon as possible when a disaster impacts a part of the business.”
Moreover, they explain that a Business Continuity Plan (BCP) specifically includes a list of priorities to focus on during the first hours of the event, but it also consists of a response plan, the definition of roles and responsibilities, a recovery checklist, etc. At the same time, a BCP includes a set of Disaster Recovery Plans.
Julio Cossío (.MX) underlines that, apart from disasters and incidents, there are other types of processes, like business transformations, changes in government, social or market changes, which can affect the operation of the organization and, thus, implement a Business Continuity Plan. He also adds:
“The implementation of a Disaster Recovery Plan and a Business Continuity Plan is related to risk management. There are different risks that are generic. Other risks are specific to computer operations, Internet operation and critical infrastructure management. The main aim of both plans is to reduce the risks at different levels by identifying and defining a path to face the incidents. Not all risks can be managed or identified, which is why this is a continuous process. The priority to manage risks is defined by a combination of probability and impact, since an organization would want to focus on high-impact and high-probability risks.”
Sebastián Castro and Dave Baker (.NZ) stress that the implementation of such plans is fundamental for the staff and the organization to respond quickly to an event: “When the incident represents an immediate threat for the safety of the staff and the main facilities, we are in need of immediate critical actions.”
What are the risks and threats usually faced by ccTLDs that prompt the implementation of these plans?
Experts say that risks may come from the place where organizations operate, mainly as a consequence of natural disasters (earthquakes, tsunamis, floods, storms, hurricanes, extreme weather conditions.) However, they also identify disasters related to operating computer systems (severe failures, data corruption, software bugs), risks related to people (human errors, disgruntled staff, dishonest staff), risks related to the Internet (DDoS, phishing, malware or viruses).
Given that ccTLDs use a critical infrastructure in both the registry and the DNS, and that the impact on the reputation could be high, they also become susceptible to lack of availability and to data tampering risks.
How are Disaster Recovery Plans and Business Continuity Plans prepared and developed?
Our interviewees explain that each ccTLD is different and that each must adapt the content of their plans to their specific circumstances. In general terms, in order to react effectively to risks and threats, plans must set out the responsibilities, communications channels and recovery guidelines. Also, it is a good idea to previously analyze the devices and/or high-priority conditions that must be dealt with.
.NZ experts describe the business continuity plan as a resource kit that is to be used for a number of situations, so it needs to contain a variety of information for each type of contingency: “When there is an incident, this kit will be used to organize and execute the plan. It is important to remember that the plan will probably have a limited scope when it comes to its implementation, so a number of its resources will no longer be relevant or its information will not be up to date.”
Once the plan is prepared, it must be tested regularly and a frequent maintenance program must be implemented to update the plan contents. In this regard, Julio Cossío (.MX) points out: “The purpose of the tests in the plan is for the team to experiment the different agreed-upon procedures and to identify opportunities for improvement before a real incident happens. This way, when a disaster occurs, the team will know how to identify it in time, what to do, who to talk to, and what alternatives to use in order to solve it as soon as possible.” Among other tasks, it is a good idea to establish an annual calendar with preparation and prevention activities that include tests, analysis of necessary resources, and the modifications made to guidelines/procedures.
The key aspect of the plan is the multidisciplinarity of the recovery team. Julio Cossío (.MX) adds that, at NIC Mexico, the recovery team is made up by senior managers, human resources administrators and those in charge of infrastructure and technology. In this regard, Sebastián Castro and Dave Baker (.NZ) hold:
“It's easy to think that only the technical services are involved in the preparation, assessment and execution of the plan, but reality is far from that. The whole organization needs to be ready in case a disaster happens. Different areas must be involved to make sure the plan is updated and can be executed when needed. Everyone needs to know their responsibilities under the different circumstances.”
Sebastián Castro and Dave Baker (.NZ) recommend a series of best practices and key actions that can be added to the Disaster Recovery Plan or Emergency Response Plan:
- Assess the situation/gather the facts – who, what, when, where and how
- Notify the key staff, the Board and next of kin (if someone from the staff has been injured)
- Notify the authorities and emergency services
- Assess the potential business impact of the incident
- Activate the Emergency Response Plan
- Get the Emergency Response Plan team moving
- Prepare key messages
- Report and give instructions to the organization's spokesperson
- Get in touch with stakeholders
- Complete the initial reports
- Keep a log of the events
They also provide a list of key steps for a Business Recovery Plan:
- Notify and inform the staff and suppliers that there has been an event that jeopardizes business continuity
- Select a BCM Manager and the staff for the Business Continuity Management team
- Determine the staff status
- Identify the damages and assess the level of business impact
- Assess the damage to critical hardware and facilities, protect and safeguard equipment if possible
- Prepare an inventory
- Keep people in the loop
- Transition back to normal operations
- Conduct reviews and draw up reports on the recovery activities
Why should all ccTLDs develop plans like these ones?
The reputation of ccTLDs is one of the aspects highlighted by experts. “Incidents happen, we’d better be ready,” state Sebastián Castro and Dave Baker (.NZ).
Pablo Rodríguez (.PR), in turn, thinks that, when an emergency, a disaster or an incident arises, “the quick restoration of operations fuels other equally important dimensions, like the organization's reputation, the trust in its services and the retention of clients.”
Another fundamental aspect has to do with the risks associated to the region in which the ccTLD is based. In this regard, Pablo Rodríguez (.PR) points out that Latin America and the Caribbean is a region with high vulnerability to natural disasters and, consequently, these ccTLDs should adopt disaster-mitigation measures and plans to restore and continue with their operations.
Finally, Julio Cossío (.MX) states: “It is essential that each one of the regional ccTLDs commits to safeguarding their infrastructure in order to be able to have a universal and affordable Internet. The creation of a community committed to the continuity of the Internet is a completely mandatory task.”
*The original post was published in the LACTLD Report No. 11