Research Paper on "Data Replication"

Home > Topics > Computers & Internet My Account

Research Paper 9 pages (2955 words) Sources: 1+ Style: APA

[EXCERPT] . . . .

Replication

Today, there are a lot of different data warehouse architectures intended to meet the users' requirements. Characteristically, data warehouses are comprised of a distributed data design with the mass transfers of data happening during off hours and widespread interactive querying going on at peak hours of the day. Therefore, correct planning for warehouse operations is very significant, particularly a company's network communications. To put off performance disasters, system professionals should be implicated in each stage of warehouse planning and expansion, as well as implementation. Network analysis should believe a number of matters, for example how frequently data updates should take place, how they ought to be planned, when they should happen, how much interactive reservations to permit, how the front-end tools operate, and what user query behavior will be (Leonard, 2007).

A typical data warehouse architecture is illustrated in Figure 1.1. Essentially, there is data extraction of operational production data that is passed on to the warehouse database. A specialized data warehouse server is used to host the warehouse databases and decision support tools, including OLAP and knowledge-based tools. This server is used to pass on extracted data to the warehouse database and is employed by users to extract data from the data warehouse using some type of software to answer users' questions and meet their information and knowledge processing requirements (Kemme & Alonso, 2000). Although not shown in Figure 1.1, operational production databases are updated continuously via OLTP applications.

Figure 1.1: The Basic Components of a Data Wareho

Continue scrolling to

download full paper ⤓

use

In turn, a warehouse database is "refreshed" from operational production systems on a periodic basis, usually during off hours when network and CPU utilization is low. Essentially, then, a data warehouse is a specialized database for supporting decision making. Data is taken from a variety of operational sources and then "scrubbed" to eliminate any inconsistencies or errors (Leonard, 2007). A common and simple type of data warehouse involves a two-tiered, homogeneous architecture. For example, the IBM DB2 data on a computer mainframe might be periodically extracted and copied to a DB/2 database on a Microsoft Windows NT server. Then a data access product, such as Information Builders Inc.'s FOCUS Reporter for Windows, can be used to read, analyze, and report on the warehouse data from a front-end graphical client on the Windows NT LAN. In contrast, more complex data warehouses are based on a three-tiered architecture that uses a separate middleware layer for data access and translation. The first tier is the host for production applications and is generally a mainframe computer or a midrange system, such as Digital Equipment Corporation's VAX or IBM's as/400 (Angoss Software, 2006).. The second tier is a departmental server, such as a Unix workstation or a Windows NT server, which resides in close proximity to warehouse users. The third tier is the desktop where IBM PCs, Apple Macintoshes, and X terminals are connected on a local area network (LAN). In this three-tiered architecture, the host (first level) is devoted to real-time, production-level data processing. The departmental server (second level) is optimized for query processing, analysis, and reporting. The desktop (third level) handles reporting, analysis, and the graphical presentation of data.

A SUMMARY SURVEY: Utilization of Data Replication in Data Warehousing

In the past, numerous corporate databases were time and again synchronized and were, in fundamentally, clones of each other. This chore, often called "nightly refresh," has been accomplished for years in the domain of computer mainframes. When only some PCs were added, the job grew into downloading and uploading data linking the PCs and the mainframe. The situation was quite controllable (Kemme & Alonso, 2000). However, when there is a network, servers, users spread across multiple time zones, groupware applications, and dependence by users on real-time information, this chore grows into a network manager's nightmare, better known as replication. Fundamentally, replication reproduces information from one database to a different one so that the data of both databases are the same. That can indicate transferring data from a location such as a central mainframe to branch servers and subsequently down to local workstations, originating news feeds to reporting and analysis requests, connecting networked servers, or inside just about any structural design. Data transfer can be one-way or two-way. It could be event based (activated by data value alterations) or time reliant (executed at usual gaps or every night).

Typically, there are a lot of benefits to replication. It is time and again put into practice to delegate processing onto a solitary server. Copies of corporate data are transferred to branch offices where departmental users can use and access the local data more competently. Replication can also be a vital part of a data warehouse policy (Pacitti & Simon, 2000) that is, merging data from numerous outfitted databases to a solitary data store for scrutiny. Comprising several copies of data also puts the phase for rapid failure recovery and cost-efficient load matching on hyperactive networks.

Replicated DBMSs are recommended for applications such as backup as well as knowledge management, OLAP, and DSS systems that do not require up-to-the-minute information and knowledge. Most managers who use these systems do not require up-to-the-minute facts (Aubrey & Cohen, 1996). Most companies doing backup do not require the up-to-date capabilities of two-phase commit. That is, many users would probably prefer working with a backup system that was slightly out of sync to waiting for the main database management system to be restored. Replication also allows companies to divide up a database and ship information and knowledge closer to those users who work with it the most. Response time improves because the information and knowledge is stored locally rather than at a central site. Wide area network costs fall because users no longer need to access the network to work with what they need.

Database Integration

As time goes on more types of databases will appear. The challenge is to integrate them in a flexible way that allows their continued expansion with local autonomy in updating, yet also allows us to automate search for answers to queries over the whole collection of databases. Two possible architectures for integrating biological database are described here in outline: a data replication approach and a federated approach (Angoss Software, 2006).

Data Replication Approach

In this structural design, all data from the different databases and databanks of importance would be transferred to a single local data repository, under a single database management method. This advance is taken by Gray, et al. (2005) who planned an architecture in which the filling of biological databanks including the EMBL nucleotide sequence databank and Swissprot are introduced into a central repository. On the other hand, we believe that a data replication approach is not suitable for this application domain for more than a few reasons, which are as follows:

Space. The quantity of biological data in accessible databanks and databases is incredibly large, and new data are being produced at an increasing rate. A small number of sites have enough disc space to parallel all data that may be required by those sites clients. At present, national bioinformatics nodes give a repository service for a lot of databanks. On the other hand, a site yearning to integrate its confidential local data with the existing communal resources would be required to mirror (no less than part of) these.

Updates. Scientists want right of entry to the most fresh data. They desire online access to results statement in the current journals the instant these have been put down in a databank or database. At whatever time one of the causative databases is modernized the same update would have to be made to the data warehouse. (Alterations and deletions are from time to time made to biological databases, but are to a lesser amount of frequent than additions.) One more possibility is for the data to be rationalized locally and every so often copied across to a middle form, but then there is a holdup in getting modern information (Applehans, et al. 2004).

Autonomy. Considerably, by taking on a data repository advance the advantages of the individual varied systems are misplaced. For instance, many biological data resources have their individual customized graphical boundaries and search engines that will be modified to the same physical depiction used with that data set. They also have their individual update schedule as noted afterward. The sociological significance of a gauge of site autonomy should not be undervalues. People like to sense that they have power over their own information, and that they do not misplace this when they start allocation data.

In a nutshell, a data replication approach would need human resources, software, and hardware further than what is rationally available at each site deficient to use the information.

Federated Multi-Database Approach

We favor a federated approach that makes use of existing remote data sources, with data described in terms of entities, their attributes and relationships and their classes and subclasses. These are all described at a high-level in a shared data model, which is… READ MORE

Download Full Paper

Write a New Paper for Me

Quoted Instructions for "Data Replication" Assignment:

“

Instructions

I want the body for the research paper. A one-page description (Abstract) and One Introductory page of the research project Topic *****'DATA REPLICATION*****' was completed by ***** in order no A2025050, I have posted the references i would like to be included in the research paper work.

In the research project you are expected to provide is a good understanding of the problem (resulting in a survey part), a well-defined strategy for its solution, and a robust and well-considered solution to the problem. The research report should demonstrate a thorough understanding of the problem such that the conclusion presented can withstand academic scrutiny.

A research report which will describe your attempt to either solve a problem in the chosen domain or go a long way towards its solution. You are expected to provide a good solution approach that may require you to conduct a study, instrument experiments, or produce a formal proof of concept that leads to a publishable article. This report should be 8 typed pages, including a summary survey of about 4 pages. The report should be 1.5 spaced with characters in 12-point font. Note that the report is important. It should be written carefully so that someone new to the subject can understand it easily. So include the necessary introductory material and make sure that the presentation is good.

Data Replication

Most distributed database systems are replicated. Replication has even gained importance outside the database domain as a means of improving system access performance and system reliability. There are some classical proposals, but there are also newer ones that push the envelope significantly in how replicated data are managed. The following is a combination of new and classic publications for you to consider.

M. Buretta, Data Replication, Wiley, 1997.

S.B. Davidson, H. Garcia-Molina and D. Skeen, *****"Consistency in partitioned networks*****", ACM Computing Surveys, 17(3): 341-370, 1985.

J. Gray, P. Helland, P. O*****"Neil, and D. Shasha, *****"The dangers of replication and a solution*****", Proc. ACM SIGMOD Conference, 1996, pages 173-182.

B. Kemme and G. Alonso, *****"A new approach to developing and implementing eager database replication protocols*****", ACM Trans. Database Systems, to appear in September 2000 issue.

B. Kemme and G. Alonso, *****"Don*****'t be lazy, be consistent: Postgres-R, A new way to implement database replication*****", Proc. VLDB Conference, 2000,pages 134-143.

Y. Breitbart et al., *****"Update propagation protocols for replicated databases*****", Proc. ACM SIGMOD Conference, 1999, pages 97-108.

E. Pacitti and E. Simon, *****"Update propagation strategies to improve freshness in lazy master replicated databases*****", VLDB Journal, 8(3-4): 305-318, 2000.

E. Pacitti, P. Minet and E. Simon, *****"Fast algorithms for maintaining replica consistency in lazy master replicated algorithms*****", Proc. VLDB Conference, 1999, pages 126-137.

*****

How to Reference "Data Replication" Research Paper in a Bibliography

“Data Replication.” A1-TermPaper.com, 2011, https://www.a1-termpaper.com/topics/essay/replication-today/187796. Accessed 5 Oct 2024.

Data Replication (2011). Retrieved from https://www.a1-termpaper.com/topics/essay/replication-today/187796

A1-TermPaper.com. (2011). Data Replication. [online] Available at: https://www.a1-termpaper.com/topics/essay/replication-today/187796 [Accessed 5 Oct, 2024].

”Data Replication” 2011. A1-TermPaper.com. https://www.a1-termpaper.com/topics/essay/replication-today/187796.

”Data Replication” A1-TermPaper.com, Last modified 2024. https://www.a1-termpaper.com/topics/essay/replication-today/187796.

[1] ”Data Replication”, A1-TermPaper.com, 2011. [Online]. Available: https://www.a1-termpaper.com/topics/essay/replication-today/187796. [Accessed: 5-Oct-2024].

1. Data Replication [Internet]. A1-TermPaper.com. 2011 [cited 5 October 2024]. Available from: https://www.a1-termpaper.com/topics/essay/replication-today/187796

1. Data Replication. A1-TermPaper.com. https://www.a1-termpaper.com/topics/essay/replication-today/187796. Published 2011. Accessed October 5, 2024.

Related Research Papers: