Wikipedia

Search results

Tidying up SYSAUX – removing old snapshots which you didn’t know existed

How to reduce space taken up in the  SYSAUX tablespace by because you have many more AWR snapshots being retained than you think.
Firstly lets take an example database and we can see that we are using 92Gb  of space

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
set linesize 120
set pagesize 100
col ash form a30
col retention form a30
col snap form a30
COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40
SELECT  occupant_name "Item",
space_usage_kbytes/1048576 "Space Used (GB)",
schema_name "Schema",
move_procedure "Move Procedure"
FROM v$sysaux_occupants
WHERE occupant_name = 'SM/AWR'
ORDER BY 1
/

Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             91.88                                      SYS
Now we are looking to see how long we should be retaining AWR snapshots for and what we actually have
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
select sysdate - a.sample_time ash,
sysdate - s.begin_interval_time snap,
c.RETENTION
from sys.wrm$_wr_control c,
(
select db.dbid,
min(w.sample_time) sample_time
from sys.v_$database db,
sys.Wrh$_active_session_history w
where w.dbid = db.dbid group by db.dbid
) a,
(
select db.dbid,
min(r.begin_interval_time) begin_interval_time
from sys.v_$database db,
sys.wrm$_snapshot r
where r.dbid = db.dbid
group by db.dbid
) s
where a.dbid = s.dbid
and c.dbid = a.dbid;

ASH                           SNAP                           RETENTION
—————————— —————————— ——————————
+000000875 22:22:41.045       +000000030 12:22:28.323       +00030 00:00:00.0
We want to keep 30 days worth of snapshots – we have set the retention period to that  – but we have 875 days worth of active session history.
I rather smugly said let’s show an example and then I produce  a schema with nearly 3 years worth of snapshots being maintained but there is no trickery involved. Looking around at other systems I can find a few similar examples and it is always down to the same issue. Let’s look further.
1
2
3
4
5
6
select table_name, count(*)
from dba_tab_partitions
where table_name like 'WRH$%'
and table_owner = 'SYS'
group by table_name
order by 1;

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             2
WRH$_DB_CACHE_ADVICE                    2
WRH$_DLM_MISC                           2
WRH$_EVENT_HISTOGRAM                    2
WRH$_FILESTATXS                         2
WRH$_INST_CACHE_TRANSFER                2
WRH$_INTERCONNECT_PINGS                 2
WRH$_LATCH                              2
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               2
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             2
WRH$_PARAMETER                          2
WRH$_ROWCACHE_SUMMARY                   2
WRH$_SEG_STAT                           2
WRH$_SERVICE_STAT                       2
WRH$_SERVICE_WAIT_CLASS                 2
WRH$_SGASTAT                            2
WRH$_SQLSTAT                            2
WRH$_SYSSTAT                            2
WRH$_SYSTEM_EVENT                       2
WRH$_SYS_TIME_MODEL                     2
WRH$_TABLESPACE_STAT                    2
WRH$_WAITSTAT                           2

24 rows selected.
There is the problem, all the WRH$ data is held in 2 partitions and the overnight house-keeping job does not have time to finish it’s tidy-up before it’s 15 minutes of fame is over again for another 24 hours.
Two components of Server Manageability (SM) components that reside in the SYSAUX tablespaces cause the problem. The components involved are the Automatic Workload Repository (AWR) and Optimizer Statistics History (OPTSTAT). Both of these components have retention periods associated with their data, and the MMON process should run nightly, as part of the scheduled maintenance tasks, to purge data that exceeds these retention periods. From version 11G onwards, the mmon purging process has been constrained to a time-limited window for each of the purges, if this window is exceeded the purging stops and an ORA-12751 error is written to an m000 trace file.
For the AWR data, held in tables with names commencing with WRH$, the probable cause is due to fact that a number of the tables are partitioned. New partitions are created for these tables as part of the mmon process. Unfortunately, it seems that the partition splitting process is the final task in the purge process. As the later partitions are not split they end up containing more data. This results in partition pruning within the purge process becoming less effective.
For the OPTSTAT data, held in tables with names commencing with WRI$, the cause is more likely to be related to the volume of data held in the tables. WRI$ tables hold historical statistical data for all segments in the database for as long as specified by the stats history retention period. Thus, if there database contains a large number of tables with a long retention period – say 30 days, then the purge process will have an issue trying to purge all of the old statistics within the specified window.
I also think that two scenarios lead to this problem
1) SYSAUX running out of space, which I know has happened on at least one occasion and the volume of data left in a partition is too much to handle in a constrained time-window and so continues to grow in the same partition.
2) If the database is shut down over the period of the maintenance task and again the volume in the partition becomes too large to handle in a short-time.

The resolution to this problem is split into two stages.
Firstly a temporary manual fix is required to rectify the partition splitting failure. A job needs to scheduled on the database, to be run as the sys user which executes the following code:This job will force the splitting of partitions for the WRH$ tables, and should be scheduled to run on a daily basis until the number of partitions per table exceeds the AWR retention period.
1
2
3
begin
 execute immediate 'alter session set "_swrf_test_action" = 72';
 end;
What I do is drop the retention period to 8 days ( or maybe less) and then run this procedure once a day. It can take up to an hour but afterwards you can see the additional partitions have been created.
TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             3
WRH$_DB_CACHE_ADVICE                    3
WRH$_DLM_MISC                           3
WRH$_EVENT_HISTOGRAM                    3
WRH$_FILESTATXS                         3
WRH$_INST_CACHE_TRANSFER                3
WRH$_INTERCONNECT_PINGS                 3
WRH$_LATCH                              3
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               3
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             3
WRH$_PARAMETER                          3
WRH$_ROWCACHE_SUMMARY                   3
WRH$_SEG_STAT                           3
WRH$_SERVICE_STAT                       3
WRH$_SERVICE_WAIT_CLASS                 3
WRH$_SGASTAT                            3
WRH$_SQLSTAT                            3
WRH$_SYSSTAT                            3
WRH$_SYSTEM_EVENT                       3
WRH$_SYS_TIME_MODEL                     3
WRH$_TABLESPACE_STAT                    3


Once the correct number of partitions have been created the old data may be automatically purged by the mmon process. If this does not occur, it will be necessary to manually purge the data from the AWR tables.
Firstly turn off the production of snapshots for the duration of this activity:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
get the dbid for this database
select dbid from v$database;
<strong>-- get current snap interval (in minutes)</strong>
select extract( day from snap_interval) *24 *60
       + extract( hour from snap_interval) *60
       + extract( minute from snap_interval) snap_interval
       from wrm$_wr_control where dbid = ;
-- disable snaps by setting interval to 0
exec dbms_workload_repository.modify_snapshot_settings(interval=> 0, dbid => <dbid>)
Then determine the maximum snapshot to be purged from the database by querying the maximum snap-id outside of the retention period:
1
2
3
4
5
6
7
8
9
10
11
select max(snap_id) max_snap
from wrm$_snapshot
where begin_interval_time
< (sysdate - (select retention
from wrm$_wr_control
where dbid = ));
This will return the maximum snapshot to be purged
The AWR tables will not purge if baselines exist, so check and remove as required. The following should return only 1 row called SYSTEM_MOVING_WINDOW:
1
2
3
select baseline_name,creation_time
from dba_hist_baseline;
If other baselines are found they will need to be removed:
1
exec dbms_workload_repository.drop_baseline(baseline_name => <baseline>);
If we look at the current retention period in this code it is set to 15 days, so we set it to 8 days
1
2
3
4
select d.dbid,w.snap_interval,w.retention from DBA_HIST_WR_CONTROL w, v$database d where w.dbid = d.dbid;
Begin DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(11520,0); end;
begin</strong> DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(129600,0); <strong>end</strong>;
Then we purge the snapshots ( – in ranges if there a are a lot as in this example). This can take several hours
1
2
3
EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>100000, high_snap_id=>114159);
EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>116261, high_snap_id=>116265);
Once all that is complete it is just a  matter of tidying up the indexes
ASH    SNAP    RETENTION —————————————————————————
+000000008 23:47:01.421    +000000008 23:46:55.865    +00008 00:00:00.0

Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             93.70 SYS
1
select 'alter index '||segment_name||' rebuild online parallel (degree 4);' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type='INDEX' order by segment_name;
double check for unusable indexes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
select 'alter index '||s.segment_name||' rebuild online parallel (degree 4);'
from dba_segments s,  dba_indexes i where s.tablespace_name= 'SYSAUX'
and s.segment_name like 'WRH$_%'
and s.segment_type='INDEX'
and i.status = 'UNUSABLE'
and i.index_name = s.segment_name
order by s.segment_name
&nbsp;
select 'alter table '||segment_name||' move tablespace sysaux ;' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type = 'TABLE'   order by segment_name
Item                     Space Used (GB) Schema                  Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             10.05 SYS

No comments:

Post a Comment