For today’s small businesses, the risk and effects from an unplanned system downtime increase with each network enhancement or system upgrade. The consequences of sudden, or unexpected, disasters affect small business IT infrastructures the same way corporations are impacted.
In the case of system downtimes, small business owners and managers have looked for ways to mitigate, eliminate or minimize, as cost effectively as possible, the risks and effects of unplanned outages on the business. More importantly, they need assurance that vital company information remains available no matter what happens.
In order to ensure business continuity and survival, small businesses should follow three essential steps—from understanding the concepts of disaster recovery and information availability to calculating the business impact of downtime. Disaster recovery plans and data replication alone are not enough. All business owners should look for the most effective way to ensure the optimum level of business uptime for the organization.
STEP 1: Getting Started
Before reviewing the available technologies that support disaster recovery, consider the business and identify which business processes are most important to keeping the business operational. Once the most critical business processes have been identified, work with the business units to determine their availability requirements for each process. Document the requirements in an internal service level agreement (SLA) that specifies the availability goals for each process and articulates the costs of not meeting the goals. For example:
At company A, the order entry and shipping departments require that their information infrastructure processes must be functional 24 hours every day of the year except corporate holidays. If this requirement is not met, the company loses 80 percent of its productivity, which translates to $10,000 per hour plus penalties of $100,000 per hour for every hour the processes are unavailable.
At company B, the payroll department requires that their information infrastructure processes must be functional from 8 a.m. to 6 p.m. Monday through Friday. Not meeting this requirement costs the company 50 percent of their productivity, which translates to $1,000 per hour of downtime.
Another organization, company C, has the need to comply with strict information availability requirements due to government regulations and has made it imperative that its applications remain available even during routine backup processes.
Documenting the cost of not meeting availability requirements helps determine the value of a software investment used to improve availability. This information also helps prioritize the processes to analyze. After documenting the service levels required, start analyzing the availability needs of each business process technology by technology.
Understanding downtime and availability
Most organizations define “availability” somewhere between multiple hours of downtime with significant data loss to real-time 24/7 uptime with zero data loss. Each definition depends on the businesses needs, data and application requirements and organizational structure. The goal is to prevent the inevitable system downtime from affecting business uptime.
There are two types of downtime: unplanned and planned.
Unplanned downtime
Surprisingly, unplanned downtime represents less than five to ten percent of all downtime. These events include security violations, corruption of data, power outages, human error, failed upgrades, natural disasters and the like.
Some forms of unplanned downtime, such as hardware failure, pose a lessening threat to availability, as most servers today offer exceptional reliability. For example, IBM®’s System i® servers provide more than 99.9 percent documented reliability and average 61 months between failures—more than five years of server uptime. Unplanned downtime can strike at any moment from any number of causes. Although natural disasters may appear to be the most devastating cause of IT outages, application problems are the most frequent threat to IT uptime.
According to Gartner, Inc., a leading information technology research and advisory company, people and process problems cause an estimated 80 percent of unexpected application downtime. Human error, such as not performing a required task, performing a task incorrectly, overburdening a disk drive or deleting a critical file, play havoc with applications.
Planned downtime
While unplanned events tend to attract the most attention, planned downtime actually poses a bigger challenge to business uptime. Routine daily/weekly maintenance to databases, applications or systems usually leads to interrupted services. Studies show that system upgrades, performance tuning and batch jobs create more than 70 to 90 percent of downtime.
Although small businesses should be concerned with natural disasters, the inherent daily threat posed by application problems and human error should be the major focus. This is especially true when the exposure of software applications to unplanned downtime is aggravated by a host of other business and IT issues, such as:
- The need to retain, protect and audit e-mail, financial and other records under regulatory compliance mandates.
- The acceleration of security risks from both inside and outside the business including viruses, worms, hacker attacks and industrial espionage.
- Distributed applications that are accessed, maintained and updated by different classes of users and business partners.
- Multiple platform IT environments in which applications operate interdependently to accomplish critical business tasks.
- Fewer IT personnel and labor hours available to maintain and troubleshoot increasingly complex and data-intensive IT environments.
STEP 2: Asses the Financial Impact— Calculating the Cost of Downtime
How much does downtime cost a business? Unexpected IT outages can unleash a procession of direct and indirect consequences both short term and far reaching. These costs include:
Tangible/direct costs
- Lost transaction revenue
- Lost wages
- Lost inventory
- Remedial labor costs
- Marketing costs
- Bank fees
- Legal penalties
Intangible/indirect costs
- Lost business opportunities
- Loss of employees and/or employee morale
- Decrease in stock value
- Loss of customer/partner goodwill
- Brand damage
- Driving business to competitor
- Bad publicity/press
The dollar amount that can be assigned to each hour of downtime varies widely depending upon the nature of the business, the size of the company and the criticality of the IT systems to primary revenue generating processes. For instance, a global financial services firm may lose millions of dollars for every hour of downtime, whereas a small manufacturer that uses IT primarily as an administrative tool would lose only a margin of productivity.
However, studies show that most U.S. businesses cannot function without computer support, and most businesses that suffer catastrophic data loss or an extended IT outage go out of business. On average, enterprises lose between $84,000 and $108,000 for every hour of IT system downtime according to estimates from studies and surveys performed by IT industry analyst firms. In addition to financial services, telecommunications, manufacturing and energy are also high on the list of industries with a rate of revenue loss during IT downtime.
Consequences of downtime
No matter what the cause, downtime impacts more than daily interactions. It can affect the integrity of databases and the applications that use them. For example, a disaster recovery strategy that relies on nightly tape backups risks a whole day’s worth of data should an unplanned event occur and crash IT systems a few hours or minutes before a backup process kicks off. Some businesses could survive that kind of data loss. Others will suffer the effects for a long time into the future.
Don’t forget the added burden of compliance
Many regulations require businesses to support more stringent availability standards. Several new acts and regulations, directed at specific industries or a broad cross-section of companies, mandate the protection of business data and system availability. Businesses may incur financial or legal penalties for failing to comply with these data or business availability requirements.
- The Health Insurance Portability and Accountability Act (HIPAA) ensures that only properly authorized individuals have access to confidential patient health data and provides long-term guidelines to secure confidential information. HIPAA mandates a five-day maximum turnaround on requests for information.
- The Sarbanes-Oxley Act of 2002 stipulates that CEOs and CFOs attest to the truthfulness of financial reports and to the effectiveness of internal financial controls. Sarbanes-Oxley mandates a required timeframe in which to report financial results—each quarter and at year-end. Failure to make these deadlines can result in financial penalties.
- The New Basel Capital Accord (Basel II) requires financial institution capital reserves to include operational and credit risks and includes IT security risk as a principal operational risk. Basel II also requires business resiliency standards for any financial institution doing business in the EU.
- The Gramm-Leach-Bliley Financial Services Modernization Act of 1999 limits access to non-public information to those with a “need to know” and requires safeguarding of customer financial information. Loss of important data can lead to penalties for the financial institution.
- The Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism (USA PATRIOT) Act of 2001 defines what information can be made available to federal and local authorities for those suspected of terrorism or terrorist-related activities. This act requires contacted institutions to respond within a specific timeframe to requests for information from databases.
Cost of downtime calculator: how much does downtime cost?
To determine how much unplanned downtime costs a business per hour, ask a series of questions regarding the impact it could have on customers, partners, employees and the ability to process transactions, such as:
- How many transactions might be lost without significantly impacting the business?
- Does the business depend upon one or more mission-critical applications?
- How much revenue is lost for every hour that critical applications are unavailable?
- What are the productivity costs for the loss of available IT systems and applications?
- How are collaborative business processes with partners, suppliers and customers affected by an unexpected IT outage?
- What is the total cost of lost productivity and lost revenue during unplanned downtime?
Step 3: Uptime and Business Resiliency—It’s All About Recovery
Determine the Business RTO and RPO Requirements
Following any unplanned outage, how quickly should the business be up and running as close to normal business operations as possible? Remember, every minute is costly—take a look at the downtime cost per hour.
Recovery depends on two objectives: recovery time and recovery point. These two measures will determine the optimum availability the business requires.
- Recovery time objective (RTO). RTO defines how quickly systems need to be restored in order to have them fully functional again. The faster the RTO requirement, the closer to zero interruption in uptime and the higher availability you will require.
- Recovery point objective (RPO). RPO defines the point at which the business absolutely cannot afford to lose data. It points to a place in each data stream where information must be available to put the application or system back in operation. Again, the closer you come to zero data loss and continuous real-time access, the higher availability you will require.
A small business may have different RTOs and RPOs for each critical application. For example, a supply chain application that feeds a production plant may require a recovery time of only a few minutes with very minimal data loss. A payroll system that is updated weekly with only a few records may only require a recovery time of 12 hours and a recovery point of 24 hours or more before the impact will affect the business.
Matching uptime requirements to availability solutions
What is the best method to meet the availability requirements of each system in an organization and achieve the optimum RTO and RPO? Some organizations, or some particularly critical applications within an organization, may require an exceptionally high level of availability.
Any availability solution must ensure that information and applications remain as accessible and available as needed to continue to drive revenue, profitability and productivity at acceptable levels no matter what planned or unplanned events occur. The availability solution should:
- Protect data, applications and systems to a level that meets business requirements and RTOs and RPOs.
- Manage business uptime as automatically as possible to streamline operations and save time.
- Assure the integrity and quality of the environment during interruptions and when it returns to full operations.
Small businesses that face the potentially devastating consequences of unplanned downtime can protect themselves against the loss of time and money with an information availability solution.
These businesses can implement information availability in several different ways, including replicating data to a secondary server to maintain continuous application availability or frequently backing up data to a server at a remote location for disaster recovery in the event of a total facility loss at the production site.
Let’s look at some of the options to protect small businesses from the consequences of downtime.
Tape backup/archiving solutions
Tape-based backup and recovery solutions are the oldest form of disaster protection. Tape solutions offer relatively low cost and high portability. Currently, this is a common backup for many small businesses because it represents a relatively low-cost way to archive information for the long term. There is no doubt that tapes will continue to play some role in the IT infrastructure for years to come. For example, even in high RTO and RPO businesses, where more advanced availability solutions are also used, tape can still play a role in protecting and backing up non-critical applications. Due to its own limitations, however, tape will be unable by itself to provide RPOs or RTOs of seconds, minutes or even a few hours. Since many organizations have a substantial investment in tape storage solutions, an information availability software solution should act as a complement to tape strategy.
Disk-based backup and practical availability
This provides readily available access and business data protection with RTOs and RPOs in a matter of hours. By performing frequent data backups to a secondary server or partition, it provides businesses the ability to efficiently recover from an unexpected outage without the loss of large amounts of data or days or weeks of labor restoring the production environment. When the backup server is placed in a remote location, it also serves as a disaster recovery solution.
Continuous Data Protection
Continuous data protection, or CDP, is a flexible disk-based technology that enables businesses to quickly and easily recover their data to any point in time. For example, it’s not uncommon for a user to accidentally delete a critical file or for a virus to corrupt business data. CDP allows recovery of the data to a point in time just prior to the accidental deletion or virus corruption. This earlier version of the data can then be restored to the production environment.
High availability
High availability delivers continuous uptime with zero data loss so that applications and business data are available on demand for any business environment. A backup server is always available with an RTO of seconds to minutes and an RPO of zero. High availability dramatically reduces the risks and costs of business interruptions. In addition, recent updates within CDP capabilities make it an increasingly manageable strategy for ensuring business continuity.
Multi-platform protection
Because separate business-critical applications may be running simultaneously on different operating systems, some organizations require a multi-platform information availability solution.
Both need protection from unplanned outages to keep the business functioning and reduce the risk of lost revenue and productivity.
Take the Next Step: Ensure IT and Business Survival
When the real-world costs of unplanned downtime are taken into account, an information availability solution is a cost effective strategy for protecting businesses from serious injury.
In particular, small businesses can benefit significantly from information availability solutions because they are generally more vulnerable to severe damage from unexpected outages and have fewer resources to stage a recovery.
Information availability solutions shouldn’t be hard work or extend beyond the budget. They are affordable, easy-to-manage solutions that provide significant benefits to small businesses by minimizing the risks and consequences posed by unexpected IT outages. An information availability solution:
- Lowers the risk of significant costs to businesses, such as lost revenue, productivity, legal penalties and brand damage caused by unplanned downtime;
- Protects business relationships with customers, partners and suppliers by ensuring that applications and data will be available to satisfy their needs and unique schedules;
- Enforces service-level agreements by maintaining predictable RTOs and RPOs in the event of an IT outage;
- Enhances ROI on existing resources by assuring they will be available to generate revenue and support business processes; and
- Ensures compliance with government and trade regulations by securing e-mail and record retention requirements and protecting the availability of business data and reporting processes.
I totally agree to the essential things you listed in your post. However, a key component of any Disaster Recovery Plan is a clearly defined process that allows for a smooth transition of operations.
I’ve come across this article that explains in detail what should be covered in a disaster plan:
importance of effective disaster recovery plan