Oracle GoldenGate enables us to extract and replicate data across a variety of topologies as shown the diagram below as well as the exchange and manipulation of data at the transactional level between a variety of database platforms like Oracle, DB2, SQL Server, Sybase, Teradata, c-tree, MySQL etc.
It can support a number of different business requirements like:
- Business Continuity and High Availablity
- Data migrations and upgrades
- Decision Support Systems and Data Warehousing
- Data integration and consolidation
Oracle GoldenGate configuration on Oracle Database and Oracle Linux
The Oracle GoldenGate Manager process is used to manage all of the Oracle GoldenGate processes and resources. A single Manager process runs on each server where GoldenGate is executing and processes commands from the GoldenGate Software Command Interface (GGSCI). The Manager process is the first GoldenGate process started. The Manager then starts and stops each of the other GoldenGate processes, manages the trail files, and produces log files and reports. Manager process must be running on both the source as well as target systems before the Extract or Replicat process can be started.
Capture is the process of extracting data that is inserted into, updated on, or deleted from the source database. In GoldenGate, the capture process is called the Extract. In this case, the Extract is called a Local Extract (sometimes called the Primary Extract) because it captures data changes from the local source database. There are several types of extracts. Another type of Extract that’s discussed later is the data-pump Extract, which passes the Local Extract changes to the target server. You can also have an initial-load Extract to capture database records to perform an initial load of the target database.
Extract is an operating-system process that runs on the source server and captures changes from the database transaction logs. For example, in an Oracle database, Extract captures changes from the redo logs (and in some exceptional cases, the archived redo logs) and writes the data to a file called the Trail File. For Microsoft SQL Server, Extract captures changes from the transaction log. To reduce the amount of processing, Extract only captures committed changes and filters out other activity such as rolled-back changes. Extract can also be configured to write the Trail File directly to a remote target server, but this usually isn’t the optimum configuration.
In addition to database data manipulation language (DML) data, you can also capture data definition language (DDL) changes and sequences using Extract if properly configured. You can use Extract to capture data to initially load the target tables, but this is typically done using DBMS utilities such as export/import or Data Pump for Oracle.
You can configure Extract as a single process or multiple parallel processes depending on your requirements. Each Extract process can act independently on different tables. For example, a single Extract can capture all the changes for of the tables in a schema, or you can create multiple Extracts and divide the tables among the Extracts. In some cases, you may need to create multiple parallel Extract processes to improve performance, although this usually isn’t necessary. You can stop and start each Extract process independently.
You can set up Extract to capture an entire schema using wildcarding, a single table, or a subset of rows or columns for a table. In addition, you can transform and filter captured data using the Extract to only extract data meeting certain criteria. For example, you may want to filter a Customer table to only extract customer data if the customer’s name is equal to “Jones”.
The Replicat process runs on the target system and reads transactional data changes as well as DDL changes and replicates then to the target database. Like the Extract process, the Replicat process can also be configured for Initial Load as well as Change Synchronization.
The Collector is a background process which runs on the target system and is started automatically by the Manager (Dynamic Collector) or it can be configured to stsrt manually (Static Collector). It receives extracted data changes that are sent via TCP/IP and writes then to the trail files from where they are processed by the Replicat process.
Trails are series of files that Oracle GoldenGate temporarily stores on disks and these files are written to and read from by the Extract and Replicat processes as the case may be. Depending on the configuration chosen, these trail files can exist on the source as well as on the target systems. If it exists on the local system, it will be known an Extract Trail or as an Remote Trail if it exists on the target system.
Data Pumps are secondary extract mechanisms of Oracle GoldenGate which exist in the source configuration. This is optional component and if Data Pump is not used then Extract sends data via TCP/IP to the remote trail on the target. When Data Pump is configured, the the Primary Extract process will write to the Local Trail and then this trail is read by the Data Pump and data is sent over the network to Remote Trails on the target system.
In the absence of Data Pump, the data that the Extract process extracts resides in memory alone and there is no storage of this data anywhere on the source system. In case of network of target failures, there could be cases where the primary extract process can abort or abend. Data Pump can also be useful in those cases where we are doing complex filtering and transformation of data as well as when we are consolidating data from many sources to a central target.
When processing transactional data changes, the Extract process can obtain data directly from the database transaction logs (Oracle, DB2, SQL Server, MySQL etc) or from a GoldenGate Vendor Access Module (VAM) where the database vendor (for example Teradata) will provide the required components that will be used by Extract to extract the data changes.
To differentiate between the number of different Extract and Replicat groups which can potentially co-exist on a system, we can define processing groups. For instance, if we want to replicate different sets of data in parallel, we can create two Replicat groups.
A processing group consisits of a process which could be either a Extract or Replicat process, a corresponding parameter file, checkpoint file or checkpoint table (for Replicat) and other files which could be associated with the process.