Oracle Java heap space error, audit files, gc buffer busy acquire, and load spikes

I added a couple nodes to a RAC recently and thought I’d post a bit of my experience and my cheat sheat. This was an policy-managed RAC.

Java heap space error and audit files

While running for the Grid home I got the following error:

Exception java.lang.OutOfMemoryError: Java heap space occurred..
java.lang.OutOfMemoryError: Java heap space

Of course, on my “test” system this didn’t happen, so it panicked me a bit mainly because I thought this type of error might have resulted in something being left really out of whack. I did some searching and found suggestions to modify the oui configuration to increase the Java heap memory. I don’t like stuff like that, but fortunately found MOS Note 12318325. According to the note it is simply the result of too many audit files – apparently enough that there isn’t enough java heap space to store their names. Fortunately that was it, and clearing out  <GRID_HOME>/rdbms/audit solved the problem. I ran again and all was fine.

Cluster Wait Spike after adding node and starting instance

Another thing I noticed was that after starting the new instance, I saw a massive spike in Other and Cluster waits, specifically “gc buffer busy acquire”. I figured this was probably re-mastering of blocks to the new node. In both cases it settled down after a bit and all was fine. I haven’t had time to research if in fact this was it, but I’m pretty sure it was re-mastering.

Where are my redo logs and /etc/oratab entries

Finally, and this is odd, sometimes after running “emca –addNode db” it would start the new instance on the new RAC node, update /etc/oratab, and add redo logs. Other times it wouldn’t do either of these things and I’d have to do it myself. I’m sure if I did another I’d be able to figure out exactly what the reason is. In any case, this is something to pay attention to.

My cheat sheet, given for you

Here is my cheat sheet for adding a node in 11.2.0 with a policy-managed database.

Make sure the new node has the same OS and is patched the same
make sure kernel parameters, limits, etc are all configured the same
make sure the same users and groups exist
make sure ssh authentication is configured and /etc/hosts updated on all nodes

For the new node do the post Hardware/OS check 
on existing node as grid owner (usually oracle)
oracle> $GRID_HOME/bin/cluvfy stage -post hwos -n rac-node-07 -verbose > cluvfy_post_hwos_n7.log

then do the pre nodeadd check on the new node
on existing node as grid owner (usually oracle)
oracle> $GRID_HOME/bin/cluvfy stage -pre nodeadd -n rac-node-07 -fixup -fixupdir /tmp > cluvfy_pre_nodeadd_n7.log

Now add the node to clusterware from an existing node, as oracle
oracle> $GRID_HOME/oui/bin/ "CLUSTER_NEW_NODES={rac-node-07}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac-vip-07}"
when done, run root scripts on the new node as directed, as root
root> /oracle/oraInventory/
root> /oracle/
Now good idea to run cluvfy and verify installation, as oracle
oracle> $GRID_HOME/bin/cluvfy stage -post nodeadd -n rac-node-07 -verbose > cluvfy_post_nodeadd_n7.log

Now add the RDBMS software
first make sure ownership and permissions are good for your ORACLE_BASE directories (e.g. /oracle and /oracle/
on existing node as oracle (note, now we are running this from ORACLE_HOME, not GRID_HOME)
oracle> $ORACLE_HOME/oui/bin/ "CLUSTER_NEW_NODES={rac-node-07}"
when done, run root script on new node
root> /oracle/

You should check docs here, how you add the new instance depends on whether you are policy or admin managed. 
In the case I had it was policy-managed:
increase the max servers in your server pool to the appropriate number (assuming you use just one pool)
  oracle> srvctl modify serverpool -g main -u 8
Now add the new node (this using EMCA tool):
  oracle> emca -addNode db
          <follow prompts>
at this point you should be able to startup the database instance on the new node 
ensure your $ORACLE_BASE and $ORACLE_HOME have the correct ownership and permissions
oracle> srvctl start instance -d myprd -i myprd_7

I found I had to wait a little while and manually bounce emctl dbconsole for the new node to be clickable in EM performance screens

If OMF/ASM then UNDO should get created for you, but check
check for UNDO and if not created, create it yourself 

verify /etc/oratab is how you want it

add crontab, monitoring, logrotate, etc

and that’s it folks. Go to bed.

4 responses to “Oracle Java heap space error, audit files, gc buffer busy acquire, and load spikes

  1. Pingback: Oracle Java heap space error, audit files, gc buffer busy acquire, and load spikes | Jed's

  2. Thanks Jed. Very very helpful in what I’m about to do. Much appreciated.

  3. Some additional notes.

    running cluvfy stage -pre nodeadd is nice but
    cluvfy stage -pre crsinst tests even more obscure stuff.

    I also found that wants the directories for software created on the new target host. So in our case
    I duplicated the directory and ownership from an existing node onto our new node.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s