Wikipedia

Search results

Troubleshooting Session Administration

The aim of this document is to provide a Troubleshooting Guide for Session Administration, including :
Troubleshooting Details
General Problem Analysis
Session Tracing
References


ISSUE :  Connections to the database have caused the current number of operating system processes associated with the Oracle Instance to exceed the PROCESSES database parameter

Troubleshooting Steps

The typical remedy to an ORA-20 - maximum number of processes (%s) exceeded ... is to increase the PROCESSES parameter ... This is usually only a temporary solution
If using an init.ora:
1. Edit the database init.ora
2. Locate and increase the processes parameter to a higher value.
3. Save the init.ora file
4. Stop and restart the database
5. Re-connect to the database instance

If using an spfile:
1. Modify the spfile, increasing the processes parameter to a higher value: alter system set processes= scope=spfile;
2. Stop and restart the database
3. Re-connect to the database instance

Here are the steps to diagnose the cause of the ORA-20

1.a) The first step is to gather information about the processes that exist when the ORA-20 occurs

connect ... as sysdba
set markup html on
set pagesize 30
spool processes_sessions.html
select
p.username "V$PROCESS - OS USERNAME",
p.terminal,
p.program,
s.username "V$SESSION - USERNAME",
s.command,
s.status,
to_char(s.LOGON_TIME,'DD-MON-YY HH24:MI:SS') "LOGON TIME",
s.LAST_CALL_ET/60 "LAST_CALL TIME",
s.server,
s.process,
s.machine,
s.port,
s.terminal,
s.program,
s.sid,
s.serial#,
p.spid
FROM v$session s,v$process p
WHERE p.addr=s.paddr
order by p.background desc;
spool off
exit;

NOTE .. depending on version ... S.PORT may need to be removed from the query
 1.b) If the issue happened in the past, you can try capturing the data from the AWR views. This is available only from 10g onwards and requires license for Oracle Diagnostic Pack.
select program, module, machine, count (*) from DBA_HIST_ACTIVE_SESS_HISTORY where SNAP_ID between snapid and snapid and INSTANCE_NUMBER=1 group by program, module, machine;

2) Look for patterns in the output generated in step #1 (processes_sessions.html)

SAMPLE OUTPUT
V$PROCESS - OS USERNAMETERMINALPROGRAMV$SESSION - USERNAMECOMMANDSTATUSSERVERPROCESSMACHINEPORTTERMINALPROGRAM
oracleUNKNOWNoracle@filnx10 (TNS V1-V3)SYS3ACTIVEDEDICATED27140filnx100pts/2sqlplus@filnx10 (TNS V1-V3)
oracleUNKNOWNoracle@filnx10 (TNS V1-V3)
oracleUNKNOWNoracle@filnx10 (Q001)0ACTIVEDEDICATED24739filnx100UNKNOWNoracle@filnx10 (Q001)
oracleUNKNOWNoracle@filnx10 (VKTM)0ACTIVEDEDICATED24534filnx100UNKNOWNoracle@filnx10 (VKTM)
oracleUNKNOWNoracle@filnx10 (GEN0)0ACTIVEDEDICATED24538filnx100UNKNOWNoracle@filnx10 (GEN0)
oracleUNKNOWNoracle@filnx10 (DIAG)0ACTIVEDEDICATED24540filnx100UNKNOWNoracle@filnx10 (DIAG)
oracleUNKNOWNoracle@filnx10 (DBRM)0ACTIVEDEDICATED24542filnx100UNKNOWNoracle@filnx10 (DBRM)
oracleUNKNOWNoracle@filnx10 (DIA0)0ACTIVEDEDICATED24544filnx100UNKNOWNoracle@filnx10 (DIA0)
oracleUNKNOWNoracle@filnx10 (MMAN)0ACTIVEDEDICATED24546filnx100UNKNOWNoracle@filnx10 (MMAN)


     * If there are many V$SESSION.STATUS = INACTIVE ...

            This means that there are many user sessions that have connected but are not doing anything

            POSSIBLE SOLUTION : Enable dead connection detection (DCD) and user resource limits

                    A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes (Doc ID 601605.1)

     * If there are many V$SESSION.PROGRAM (and possibly many V$SESSION.STATUS = INACTIVE) entries from the same program ...

            POSSIBLE SOLUTIONS :

            1) Examine the program (often a web server) to see if the program has a setting to abandon a connection after X number of seconds and reconnect (a respawn event)

                If the program has such a setting ... a slowdown in either network or database performance should to be investigated ... as this would cause a timeout in the application ..
                  and thus a reconnect ... setting a longer 'timeout' in the application often will resolve future occurrences of ORA-20

             2) Like the solution above ... setting DCD and Resource Limits ... often resolves these as well (Use Note 601605.1)

     * If there are many rows with data for V$PROCESS but no data for V$SESSION ...

            This is a problem on the operating system side and such sessions will need to be killed manually with KILL -# (UNIX / LINUX) or ORAKILL (Windows)

             This condition needs to be examined by an operating system expert (System Administrator or OS Vendor)

             WHY? - When an Oracle session is terminated ... whether by the user logging out ... a kill session by a privileged user ... or even the user reaching a resource limit set in their profile
                           Oracle SMON (or PMON) will first clean up the database resources (rollback transactions ... release locks etc)
                           Oracle then will remove the entry in V$SESSION as the session is now 'cleaned up'
                           Oracle will then request that the operating system terminate any OS processes associated with the former Oracle Session

               The Oracle Kernel cannot force the OS to terminate the processes .. this occurs at the OS level only ... all it can do is request that they be terminated
 3) For issues that happened in the past, you can check the data from dba_hist_active_sess_history. This view will have a sampled data (again not the entire sessions connected) on the sessions established. You can get the details for the required time frame and identify the application or the user which created the maximum number of sessions. You can check the session activity also.

4) For issues when after an ORA-20 "maximum number of processes (%s) exceeded" occurs , no new connections (even SYSDBA or SYSOPER) may be made to the instance..
This is related to unpublished Bug 8518197: LOGINS NOT POSSIBLE AFTER ORA-20 and unpublished Bug 8518126: LOGINS NOT POSSIBLE AFTER ORA-20
There are a couple of 'workarounds' for this issue ... but no real solution at this time (an enhancement request has been filed)

Workaround:


1. Kill one or more of the client connections at operating system level.
UNIX: kill -9 one or more of the client connections

WINDOWS : using ORAKILL ... kill one or more of the client connections

* Use an existing connection with sufficient privileges (if one is logged on) to view V$SESSION / V$PROCESS and
         alter system kill session 'SID, SERAL#';

How to Avoid ORA-20 Error

To avoid database hitting ORA-20, we can implement a logon trigger, such that all the non SYS/SYSTEM connections get an exception when database has reached a threshold for processes parameter. This way we can avoid a database crash due to ORA-20. Below document discusses the trigger creation:
How to Avoid ORA-20 and Allow SYS or System Connection Using Logon Trigger to Limit the Connections Below the Processes Parameter Value (Doc ID 1585035.1)

Known Issues

In some cases ORA-20 errors are reported while starting up the database, after a crash. This is mostly due to the stray semaphores and shared memory segments. For details refer to Doc ID 1671383.1

REFERENCES

NOTE:169706.1 - Oracle Database (RDBMS) on Unix AIX,HP-UX,Linux,Mac OS X,Solaris,Tru64 Unix Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 11.2)
NOTE:601605.1 - A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes
NOTE:805586.1 - Troubleshooting Session Administration
NOTE:825045.1 - How to Connect After ORA-20 has Occured
NOTE:1585035.1 - How to Avoid ORA-20 and Allow SYS or System Connection Using Logon Trigger to Limit the Connections Below the Processes Parameter Value
NOTE:1050281.1 - Getting ORA-00020/ ORA-00018 With A High Number Of CPUs Regardless Of How High The Related DB Parameters Are Set.
NOTE:458527.1 - ORA-00020 and ASM
NOTE:1460440.1 - Script PXHCDR.SQL: Parallel Execution Health-Checks and Diagnostics Reports

No comments:

Post a Comment