Oracle GoldenGate replication latency reporting

I setup GoldenGate on a system this year and kept getting asked the question “What is the latency?”.  It was easy enough to tell them what the “lag” is on the replicats, but I wasn’t 100% convinced that it was accurate. So, I came up with this simple solution to monitor and keep historical latency data.

So, while the GoldenGate “lag” is probably accurate and supposed to be based on the “commit” timestamp of the original insert, this provides historical information as well as giving you an actual data example.

Overview

The idea is that we’ll insert a row on the extract side and store the time of insert.
Then on the replicat side we’ll have a GoldenGate column mapping to add the time of insert on the replicat side. With this we can see, at any given time, how long it is taking for a row to make it across.

Caveats:

  • If you have multiple replicats applying a schema then this only tells you the latency for the replicat it is processed by.
  • GoldenGate’s datenow() function does not include milliseconds, so while the source_time computed by Oracle will have milliseconds, the target_time won’t. I’ve left it that way in the hope that eventually GoldenGate will provide timestamp with milliseconds. Keep this in mind when computing latency, or consider using a DATE data type. You will also see an example of this and query-based solution below.
  • The source_time is when the row is inserted. If you take a long time to commit the row then you will see latency that isn’t actually the latency. You should insert a row and commit immediately.
  • You could setup a trigger on the table to populate the columns on the target side, but you then would have to manage the trigger, and in an active-active environment it would be a little more complicated. It would also solve the problem of the timestamp issue.

Steps to setup

Create the Table

create table rep_latency_jsw
( tag         varchar(50)   not null,
  source_host varchar(100)  default sys_context('USERENV','HOST') not null,
  source_time timestamp     default systimestamp not null,
  target_host varchar(100)  null,
  target_time timestamp     null
);
alter table rep_latency_jsw add constraint rep_latency_jsw_pk primary key (source_time,source_host);
@jdesc rep_latency_jsw
  • tag – I use this for two reasons. First, it allows you to insert a row with only the tag specified, thus letting the defaults populate the correct source host and source time. Second, you can put values in it to identify the use, in case you have multiple schemas and want to easily union the data together.
  • source_host/source_time – Yep, this is to track where and when the row was inserted. I recommend that you do not specify these, and let the defaults populate it for accuracy.
  • target_host/target_time – These should not be populated on the insert. The GoldenGate replicat will populate these when it inserts the row (see below).

The _jsw is simply my signature. It reduces the possibility of the table name conflicting with an existing or future table, as well as making it more obvious that it isn’t part of the schema proper.

Stop replication

At this point you should stop your extract, pump, and replicat.

Add trandata with GGSCI

dblogin <user> password <password>
add trandata myschema.rep_latency_jsw

Add the mapping rules on the replicat

Add the following to your replicat parameter file preceeding any existing default mapping clauses (e.g. MAP myschema.*, TARGET myschema.*; ):

MAP myschema.rep_latency_jsw, target myschema.rep_latency_jsw,
colmap (usedefaults,
target_time=@DATENOW(),
target_host=@GETENV("GGENVIRONMENT","HOSTNAME")
);

Start replication

Start your extract, pump, and replicat.

Test a row insert and monitor it on source and target

Insert a test row, and then run the query on both source and target. You should see the row populate on the target with target_host and target_time populated.

insert into rep_latency_jsw (tag) values ('My First Latency Check');
commit;
select tag, source_host, source_time,
       target_host, target_time,
       target_time-source_time latency,
       (cast(target_time as date)-cast(source_time as date))*86400 lat_secs
from rep_latency_jsw
order by source_time;

Create a job to populate rows

Of course, you don’t want to insert rows manually, so just create a simple procedure and scheduler job in the database.

create procedure rep_latency_jsw_ins
is
begin
  insert into rep_latency_jsw (tag) 
   values (sys_context('USERENV','CURRENT_USER'));
  commit;
end rep_latency_jsw_ins;
/
begin
  dbms_scheduler.create_job(job_name=>'REP_LATENCY_JSW_INS_JOB',
                            job_type=>'STORED_PROCEDURE',
                            job_action=>'rep_latency_jsw_ins',
                            number_of_arguments=>0,
                            repeat_interval=>'FREQ=Hourly;BYMINUTE=00,05,10,15,20,25,30,35,40,45,50,55',
                            enabled=>false,
                            comments=>'Job to insert rep_latency_jsw records');
end;
/
-- next, only enable on extract side, keep disabled on replicat side
exec dbms_scheduler.enable('REP_LATENCY_JSW_INS_JOB');

A useful reporting query

The following is a query that I found useful for checking the status. In the example below you can see that while the maximum latency in seconds was 41, the average is still low. That would indicate a one-time issue as opposed to a chronic issue (in fact, in this case, I had stopped the replicat for a short period of time.)

select tag,
       &&over_days over_days,
       avg((cast(target_time as date)-cast(source_time as date))*86400) avg_lat_secs,
       min((cast(target_time as date)-cast(source_time as date))*86400) min_lat_secs,
       max((cast(target_time as date)-cast(source_time as date))*86400) max_lat_secs
from rep_latency_jsw
where source_time > sysdate-&&over_days
group by tag
order by tag
/

TAG               OVER_DAYS AVG_LAT_SECS MIN_LAT_SECS MAX_LAT_SECS
------------ -------------- ------------ ------------ ------------
MYSCHEMA                .50            3            1           41

and that is it!

2 responses to “Oracle GoldenGate replication latency reporting

  1. Hi Jed, nice to find the great post about OGG latency!
    Could u pls share me with the HW configuration information for ur OGG test? I observed 4-5 secs latency in a VM pair which has 2GB main memory and a single processor core (Xeon E5 2650 v2).

    • This is live production system. 8-node 11.2.0.3 RAC on HP DL385 with 128GB RAM and I think 24 processors on SAN storage using partial flash (now migrated to an all-Flash array). So many factors effect things though, especially the wires.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s