With data, you create value only by putting it to productive use. The more purposes your data serves, the more profit you will derive from it. For a variety of reasons, the diverse applications that allow you to leverage your data fully might need their own databases. Data sharing is, thus, a critical issue for many organizations.
The nature of data-sharing requirements varies widely. For example, you might keep your operational data in Microsoft Structured Query Language (SQL) server databases, but that isn’t always where it needs to be. Or maybe your various business systems use other database management systems (DBMSs), but you need to merge the data into an SQL server database to satisfy a specific need, such as business intelligence. Whatever the case, you need a way to share data throughout your enterprise, possibly across a variety of technology platforms.
You might elect to distribute copies of your databases to different locations to minimize transmission costs, reduce response times and distribute processing loads. If so, you need a way to keep all of those copies synchronized at all times. Furthermore, business moves quickly today because of the Internet. As such, your business likely demands real-time data sharing.
Unfortunately, data isn’t always where it’s needed or in the required format. For example, consider the following scenarios:
Bridge application silos
Envision for a moment a company that uses Microsoft Dynamics Enterprise Resource Planning (ERP) to fulfill most of its operational requirements, but has a shop-floor automation application from another vendor. Naturally, the Microsoft Dynamics software uses an SQL server database, but the other application might use another database management system.
The shop-floor system maintains a record of inventories of raw materials, work-in-progress and finished goods. Dynamics might also maintain raw materials and finished goods inventories. Without a bridge between those two application silos, it may be necessary to duplicate costly manual data entry processes on both systems. And the two sets of inventory records would likely be frequently out of synch.
Reconcile content incongruities
Sometimes, the same data needs to be formatted differently in various locations for different Robert Gast reports on emerging information management technologies for business and serves as a technology writer for Vision Solutions (www.visionsolutions.com). Vision Solutions is the world’s leading provider of information availability software and services for Windows, Linux, IBM Power Systems, virtualized and cloud computing markets. purposes. For example, a global enterprise’s European operations may need to record measures in kilograms, liters and centimeters, while the American operation needs to store them in pounds, quarts and inches.
Because there is a direct, arithmetic relationship between metric and American measures, the company could choose to store all data using one measure and then translate from one to the other when necessary. However, if the American and European operations each maintain their own database, it would be more efficient to store data using local measures rather than having to perform the conversion calculation every time someone accesses the data.
Facilitate business intelligence
The greatest opportunities for increasing profitability often come not from working harder, but from working smarter. For instance, spotting consumer trends allows companies to make marketable products rather than trying to market products they’ve made. It also allows them to better judge demand so that they neither waste money producing more than they can sell nor lose sales to competitors because they don’t have a large enough manufacturing capacity or inventory to meet demand.
Interestingly, some of the most valuable insights come from consolidating information from a variety of sources. For example, companies can benefit by focusing their greatest attention on their most profitable customers. But who are the most profitable customers? If you look only at the sales database, you might assume that the most profitable customers are those who spent the most within the past year. But are those really the most valuable customers?
What if your organization charges all customers a flat shipping rate or it absorbs all shipping costs as a cost of sales? In that case, you might profit less from sales to customers located further from your warehouses than from sales to nearby customers.
Minimize transmission costs and bottlenecks
Many companies have databases that are updated only occasionally, but accessed extremely frequently. A product catalog is usually a good example of this. When those databases serve a geographically dispersed audience, transmission costs and times can be a significant burden. Even at the speed of light, transmitting data around the world can be a relatively slow process—not so much because of the distance the data has to travel, but because of the number of networking devices it has to pass through and the bandwidth bottlenecks that sometimes occur.
In the case of rarely updated databases, the solution is obvious. Strategically place copies of the database close to the people who access it. Because updates are infrequent and there is rarely a need to transmit updates among the database replicas, long-distance data transmissions can be almost eliminated.
The problem arises when those infrequent updates are made. You need a way to ensure that all of the replicas remain continuously synchronized so that they truly are replicas.
Secure Web-accessible databases
Many organizations can derive significant value by making some data available to customers, suppliers and other stakeholders over the Web. However, you probably don’t want to provide wide-open access to all of your organization’s information. Some data must be kept private for proprietary reasons. Other data must be available only on a need-to-know basis for privacy reasons—privacy that might be enforced by strict regulations.
The more pathways you open to your databases, the more opportunities there are for people with nefarious purposes to steal or destroy vital data. One way to address this problem is to extract only that data the public is allowed to see and place it beyond the inner defenses of your data center. The full database, including all confidential data, then continues to reside deep inside the multiple layers of defenses.
If you adopt this solution, you’ll need a way to keep the publicly accessible extract of your data synchronized with the full, secure database. And this synchronization facility cannot open any holes in the firewall that might be exploited by hackers with malicious intentions.
Real-time data replication
The solution to all of these needs is real-time data replication. This technology keeps multiple copies of a database, or a segment of a database, synchronized by monitoring the source database for changes and then copying those changes to the replica database(s).
The word “replication” can be somewhat misleading. A data replicator replicates the underlying meaning of the data, but not necessarily its structure or format. It is this heterogeneous capability that allows a data replicator to share data between diverse application databases and facilitate migrations between databases—including databases that use different DBMSs and different data schema—transparently and without the need for business downtime.
Data replicators from different vendors don’t all provide the same feature set. The capabilities that will be important for your organization depend on the purposes that data replication will serve. Consider, for example, the following replication features.
Heterogeneous DBMS support
This feature allows you to replicate data between databases running on different DBMSs, including different versions of the same DBMS. This is essential if you want to:
- Replicate data between disparate applications running on differing platforms;
- Merge data from a variety of DBMS platforms into a single data warehouse;
- Migrate data from an old application running on one DBMS to another vendor’s equivalent application running on another DBMS; and
- Migrate databases between DBMS versions.
Heterogeneous data schema support
Heterogeneous data schema support allows you to replicate data even when the source and target databases use different:
- Table and column definitions;
- Column names; and
- Data types.
This feature is required to serve almost all of the same purposes listed under heterogeneous DBMS support. The only exception is migrating databases between DBMS versions. In that case, the database schemas would likely be the same on the source and target databases.
Bidirectional replication
Bidirectional replication allows updates to be applied on any of the databases participating in the replication topology. Those updates are then automatically copied to the other databases in the topology. In other words, all nodes in the replication topology can act as both sources and targets of replication.
Filtering
Organizations often need replicas of only particular portions of databases. This can be accomplished via row and column filtering. For example, if the goal is to create a subset of a database that will be placed outside the innermost firewall and made available over the Web, filtering is essential to ensure that confidential data is not replicated to the public database.
Data transformations
Frequently, data holds the same meaning in different databases, but the formats and structures differ. A data transformation feature can reconcile these differences on the fly, without the need for operator intervention.
Organizations that are more successful at collecting, evaluating and sharing information are better able to deal with the forces of global competition and the ever-increasing need for access to data. With the right tools, you can extract immense value from the mountain of data you already have.
SQL Server replication is really easy to setup. The problem I’ve run in to in the past is keeping it going. If the transmission gets dropped, then the data replication needs to be setup again. This is one issue we would run in to years ago. Not sure if this has changed recently, but I know it used to be an issue.
One of the main responsibilities of the data science service provider is to have a clear understanding of data-driven processes and experience to provide expertise about these practices and guide the customer in applying them profitably. I