Durability – Oracle vs Cassandra

In this post, i will try the analogy between how Cassandra provides durability to data and how it compares to well-know (at least for me) Oracle “method” of data persistence.

So Cassandra uses something called commitlog, where data that is inserted, updated or deleted is appended.
In that way Cassandra provide us the “D”, of Durability. As the operation of a write is always an “append”, there is no seek time (or is minimal) and the sequential write is much more efficient.

Also, there is a parameter called commitlog_sync that defaults to “periodic“, that will sync commitlog with disk every commitlog_sync_period_in_ms (1000ms of default in 2.x).There is also other modes of commitlog_sync, mainly “batch” that will tell Cassandra to group the “mutations” into batches and then sync it.

Does this commitlog job sounds familiar to you? Oracle redo log data is first written in a buffer memory (like commitlog) right before it is flushed to the disk. LGWR is also responsible for writting the commit records to the log file, also it also groups the “commits” (when needed) and flush it to the redo log files in “batch”.
LGWR doesn’t have a single parameter like “commitlog_sync_period_in_ms” or “commitlog_batch_window_in_ms” to control when flushing from redo buffer to disk happens, it has a bunch of conditions when to flush the contents to redo files (every 3 seconds, redo buffer is 1/3, before DBWR writes out a dirty block,etc).

As Cassandra on commitlog and Oracle on redo log files, both are used in case of disaster or node restart need to be replayed to unsure that recovery is done.

The little difference here is subtle: If you are really unlucky and all your Cassandra replicas crash during the “period_in_ms” or “windows_in_ms” (depending of your commitlog_sync parameter), your data is lost, unless you set is to 0.

That doesn’t happen (by design) in Oracle and LGWR. No matter what you do in the database, when it crashes you recovery everything that showed to you to “committed”.

But for Cassandra this is only part of the game, the other part of the game is something called memtable that is basically a structure in memory with the content stored in key-value pair and sorted by key. memtable also stores every single write until it reaches a specific threshold and then it is flushed to immutable and ordered disk structure called SSTables.
So the question is, how Cassandra knows what data is actually persistent as SSTable and in case of node disaster what to “replay”? The answer is simple: As soon the memtable is flushed to a SSTable the data in commitlog is purged!

Here, there is no archivelogs – data is purged as soon as it gets persistent in the nodes.

Update: As a few readers pointed out, after Cassandra 2.0 (my fault to not verify newer versions), an archive of commit log is possible when the node starts,when the commit log is written to disk or even at a specific point in time.
Comparing to Oracle, the concept is almost the same (except that Oracle and LGWR don’t necessarily archive at startup of the instance), but it is also used for “backup” and “recovery” supporting PIT restores.

Hope Cassandra is now more clear to you, as a Oracle DBA.

PS: Thanks isinghora and Ishinder Singh for point me out my error.

OLT – An initial overview of Oracle Linux Test – Part 1

The OLT Toolkit (a.k.a Oracle Linux Test)  is an opensource tool provided by Oracle here. It contains a full set of tools ranging from OLTP/DSS workload tools to kernel test modules like mempressure as well as virtualization tests. It is a fully loaded “on-steroids” set of tests to make sure that your Oracle infrastructure and configurations are perfect.

Please remind while you are reading this, that i’m searching for whatever it looks useful for DBAs, not a complete QA and verification framework to run on every machine.

Installation
I’m not going to describe installation process, you will figure out how to install it (README). Just make sure that your system is set with a FQDN (full qualified domain name), otherwise configuration will fail.
Please also review olt-setup.env file before start the installation as it will describe what kind of setup you want to submit to the tests (Single Instance, RAC or XEN).
Installation process will install a few RPMs for you. Installation went fine on Oracle Linux 7.1, as the two questions were the missing package dependencies (easily spot on log files) and not using FQDN.

There is a almost-complete user guide here that mostly describe all you need to run in different configurations. For now with single instance all you need is this:

[oracle@phoenix OLT]$ wget https://oss.oracle.com/projects/olt/dist/files/packages/OLT-bundle/OLT-2.3.0-7.tar.gz
oracle@phoenix OLT-2.3.0-7]$ ls -ltr
total 36
-rwxr-xr-x. 1 oracle oinstall 12010 May  5  2015 oltInstaller --> Installer itself
-rw-r--r--. 1 oracle oinstall  7108 May  5  2015 olt-setup.env --> Review before install
-rw-r--r--. 1 oracle oinstall  1842 May  5  2015 README
drwxr-xr-x. 3 oracle oinstall  4096 May  5  2015 RPMS
-rw-r--r--. 1 oracle oinstall  5106 Nov 13 22:07 olt-install-log --> Installation log

[root@phoenix OLT-2.3.0-7]# ./oltInstaller

Initial test

Searching for something useful, i’ve decided to modify the file provided ( validate.cmd present in /opt/oracle/oltest/olt-schedule) and enable only one particular test called sv-asm.

[oracle@phoenix olt_tests]$ pwd
/home/oracle/olt_tests

# Default command file containing all the testcases

##### Silent Install tests #####
#phoenix.localdomain=install-silent-oracle-10201
#phoenix.localdomain=install-silent-oracle-10202
#phoenix.localdomain=install-silent-oracle-10203
#phoenix.localdomain=install-silent-oracle-10204
#phoenix.localdomain=install-silent-oracle-11106
#phoenix.localdomain=install-silent-oracle-11107
#phoenix.localdomain=install-silent-oracle-11201
#phoenix.localdomain=install-silent-oracle-11202
#phoenix.localdomain=install-silent-oracle-11203
#phoenix.localdomain=install-silent-oracle-11204
#phoenix.localdomain=install-silent-oracle-121010
#phoenix.localdomain=install-silent-oracle-121020
##### End of Silent Install tests #####


##### Housekeeping tests #####
phoenix.localdomain=sv-asm
#phoenix.localdomain=sv-hangcheck
#phoenix.localdomain=sv-ocfs2
#phoenix.localdomain=sv-ethtool
#phoenix.localdomain=sv-miitool
#phoenix.localdomain=sv-hyperthreading
#phoenix.localdomain=sv-numa

...

As you can see, only sv-asm (that verify if the oracleasm kernel module is present) runs.

Run/schedule it now by using olt-schedule command

[oracle@phoenix olt_tests]$ olt-schedule -f ./validate_noinstall.cmd -r

Validation is the next step

[oracle@phoenix olt_tests]$ olt-schedule -f ./validate_noinstall.cmd -s
>>> Node : phoenix.localdomain
    -    testsuite "sv-asm" has PASSED

It looks good, but how we really know what happened behind the scenes? Lets check the logs and the result in work directory.

[oracle@phoenix work]$ ls -ltr
total 8
drwxr-xr-x. 4 oracle oinstall   75 Nov 14 00:46 sv-asm_Nov14_15-00-46-24

[oracle@phoenix sv-asm_Nov-14-00-46-24-504885828-10576]$ cd results/
[oracle@phoenix results]$ tail -100f testrun.log
+ '[' x == x1 ']'
++ uname -r
+ KVER=3.8.13-55.1.6.el7uek.x86_64
++ find /lib/modules/3.8.13-55.1.6.el7uek.x86_64/ -name oracleasm.ko
+ asm_module1=/lib/modules/3.8.13-55.1.6.el7uek.x86_64/kernel/drivers/block/oracleasm/oracleasm.ko
++ find /lib/modules/3.8.13-55.1.6.el7uek.x86_64/ -name oracleasm.o
+ asm_module2=
+ '[' x/lib/modules/3.8.13-55.1.6.el7uek.x86_64/kernel/drivers/block/oracleasm/oracleasm.ko '!=' x -o x '!=' x ']'
+ echo 'oracleasm module is present in the running kernel'
oracleasm module is present in the running kernel
+ exit 0

[oracle@phoenix work]$

Now, i’ve picked two tests: ft-aio and ft-dio.
* ft-aio – Verify that Oracle IO layer uses aio system calls – io_submit, io_getevents ­ via strace
* ft-dio – Verify that Oracle opens the database files with the O_DIRECT flag via strace

[oracle@phoenix olt_tests]$ olt-schedule -f validate_noinstall.cmd -s
>>> Node : phoenix.localdomain
    -    testsuite/testcase "ft-aio" is IN PROGRESS
>>> Node : phoenix.localdomain
    -    testsuite "ft-dio" is NOT YET SCHEDULED

As i’ve discovered quickly  that most of the tests (Oracle says all) require that the test of “Silent-installation” – the real installation of an Oracle Home in the server is needed to perform other kind of tests (?!) and the tests on ft-aio and ft-dio failed (at least both stayed in progress for almost an hour). FAQ says it is possible to run the tests from an existing Oracle Home, but the docs are really outdated on how to do it and not applicable on this particular version.

I’m particular interested on this as OLT allows to do a lot of test (list of tests is available here) including stress tests.
I’ve emailed someone behind the project so he can help me run the tests on an already installed Oracle Home.

To be continued…

Index metadata bug and ORA-22808

Yesterday a co-worker complained that the scripts from import/export using Datapump were not working due introduction of user defined Object Types. These object types are particulary useful to model real-world entities as objects in the database, it just happen that datapump scripts are throwing a lot of ORA-22808 when creating FK:

ORA-39083: Object type INDEX failed to create with error:
ORA-22808: REF dereferencing not allowed

I’ve found no really good articles on this issue (even on MOS), so i’ve decided to write one. First let’s create what we need to reproduce the issue, 2 types and two different tables.
The first type is the basic one and the second type contains a column called FooRef that is type REF wich means that target object must have an object identifier.
The first table (T1_MY_TABLE) is an ordinary heap table with a PRIMARY KEY on column Id that is created. The second table is a more complex since it contains a CONSTRAINT that referes first table. After this an INDEX is create on second table using FooRef.

CREATE TYPE MY_FOO_TYPE AS OBJECT
(
 Id INTEGER, 
 State INTEGER, 
 Name varchar2(255)
) NOT FINAL

CREATE TYPE MY_FOO_TYPE_2 AS OBJECT
(
 FooId INTEGER,
 FooStateId INTEGER,
 FooNativeName VARCHAR2(255),
 FooRef REF MY_FOO_TYPE
) NOT FINAL

CREATE TABLE T1_MY_TABLE OF MY_FOO_TYPE
(
 CONSTRAINT CK_Id_PK PRIMARY KEY (Id) USING INDEX
 (
 CREATE UNIQUE INDEX IDX_T1_MY_TABLE ON T1_MY_TABLE(Id) INITRANS 10 TABLESPACE GEN_IDX
 )
) OBJECT IDENTIFIER IS PRIMARY KEY TABLESPACE GEN_DATA;

CREATE TABLE T2_MY_TABLE OF MY_FOO_TYPE_2
(
 CONSTRAINT FooState_ID UNIQUE (FooStateId) USING INDEX
 (
 CREATE UNIQUE INDEX IDX_My_FOO_TYPE_2 ON T2_MY_TABLE(FooStateId) INITRANS 10 TABLESPACE GEN_IDX
 ),
 CONSTRAINT CK_R0 FooRef REFERENCES T1_MY_TABLE ON DELETE CASCADE,
 CONSTRAINT CK_R0 CHECK(FooRef IS NOT NULL),
 CONSTRAINT FooId_PK PRIMARY KEY (FooId) USING INDEX
 (
 CREATE UNIQUE INDEX IDX_FooId_PK ON T2_MY_TABLE(FooId) INITRANS 10 TABLESPACE GEN_IDX
 )
) OBJECT IDENTIFIER IS PRIMARY KEY TABLESPACE GEN_DATA;
CREATE INDEX IDX_FooId_FK ON T2_MY_TABLE(FooRef) INITRANS 10 TABLESPACE GEN_IDX;

Everything here runs without any issue if you try it. Now let’s try export and import using Datapump:

-bash-3.2$ expdp pmdb_luis/PMDB_LUIS@DB schemas=PMDB_LUIS 
directory=DATA_PUMP_DIR 
dumpfile=luis.dmp logfile=luis.log
...
Dump file set for PMDB_LUIS.SYS_EXPORT_SCHEMA_01 is:
 /opt/oracle/app/oracle/admin/TNMSAM/dpdump/luis.dmp
Job "PMDB_LUIS"."SYS_EXPORT_SCHEMA_01" successfully completed at 05:48:04

Now, let’s delete the objects and try the import.

DROP TABLE T2_MY_TABLE;
DROP TABLE T1_MY_TABLE;
DROP TYPE MY_FOO_TYPE_2;
DROP TYPE MY_FOO_TYPE;

-bash-3.2$ impdp pmdb_luis/PMDB_LUIS@TNMSAM schemas=PMDB_LUIS 
directory=DATA_PUMP_DIR dumpfile=luis.dmp logfile=luis.log
...
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
. . imported "PMDB_LUIS"."T1_MY_TABLE" 0 KB 0 rows
. . imported "PMDB_LUIS"."T2_MY_TABLE" 0 KB 0 rows
Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX
ORA-39083: Object type INDEX failed to create with error:
ORA-22808: REF dereferencing not allowed
Failing sql is:
CREATE INDEX "PMDB_LUIS"."IDX_FOOID_FK" ON "PMDB_LUIS"."T2_MY_TABLE" ("FOOREF"."ID") PCTFREE 10 INITRANS 10 MAXTRANS 255 STORAGE(INITIAL 163840 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "GEN_IDX" PARALLEL 1 
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
ORA-39112: Dependent object type INDEX_STATISTICS skipped, base object type INDEX:"PMDB_LUIS"."IDX_FOOID_FK" creation failed
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT
Job "PMDB_LUIS"."SYS_IMPORT_SCHEMA_01" completed with 3 error(s) at 05:50:06

So we are stuck now. DDL created without any issues, export without any issue and import with problems creating the last index on second table. I little more investigation is needed to figure out what’s “the problem”. First let’s take out of equation datapump and investigate index drop and creation. Drop all objects and recreate again the DDL presented.

SQL> select index_name from user_indexes where INDEX_NAME = 'IDX_FOOID_FK';
INDEX_NAME
------------------------------
IDX_FOOID_FK

SQL> set long 100000
SQL> select dbms_metadata.get_ddl('INDEX','IDX_FOOID_FK','PMDB_LUIS') from dual;
DBMS_METADATA.GET_DDL('INDEX','IDX_FOOID_FK','PMDB_LUIS')
--------------------------------------------------------------------------------
CREATE INDEX "PMDB_LUIS"."IDX_FOOID_FK" ON "PMDB_LUIS"."T2_MY_TABLE" ("FOOREF"
."ID")
 PCTFREE 10 INITRANS 10 MAXTRANS 255 COMPUTE STATISTICS
 STORAGE(INITIAL 163840 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 TABLESPACE "GEN_IDX"

SQL> drop index IDX_FOOID_FK;
Index dropped.

SQL> CREATE INDEX "PMDB_LUIS"."IDX_FOOID_FK" ON "PMDB_LUIS"."T2_MY_TABLE" ("FOOREF"
 2 ."ID")
 3 PCTFREE 10 INITRANS 10 MAXTRANS 255 COMPUTE STATISTICS
 4 STORAGE(INITIAL 163840 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
 5 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
 6 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 7 TABLESPACE "GEN_IDX"; 
CREATE INDEX "PMDB_LUIS"."IDX_FOOID_FK" ON "PMDB_LUIS"."T2_MY_TABLE" ("FOOREF"
 *
ERROR at line 1:
ORA-22808: REF dereferencing not allowed

As you can see, index metadata stored in data dictionary is _wrong_ and is refering FOOREF.ID instead of FOOREF only. That _might_ be related with table created with OBJECT IDENTIFIER IS PRIMARY KEY.

It is not a Datapump issue itself, Datapump uses the metadata (DDL) data dictionary during the import (impdp) and since it is wrong it can’t create the index. Of course if you manually create the index using FOOREF instead of _wrong_ FOOREF.ID you will not find any issues.
I’ve contacted MOS and they opened a bug for this issue: Bug 16369430 : IMPDP REPORTS ORA-39083/ORA-22808 WHEN CREATING AN INDEX
You will also find there this exact test case 🙂

FLASHBACK TABLE – Preservação de ROWIDs

Ontem em jeito de discussão com um colega de profissão quando falávamos sobre FLASHBACK em Oracle, mais propriamente sobre Flasback Table que permite como todos sabemos para restaurar um estado antigo de uma tabela devido a erro aplicacional por exemplo. Depende este FLASHBACK da quantidade de UNDO no sistema (a sua retenção e o seu tamanho).

Nessa conversa surgiu uma dúvida que é basicamente se num FLASHBACK TABLE preserva ou não os ROWID aquando do restauro. Decidi montar o estaminé e testar. Antes disso queria referir que para restaurar uma tabela para um SCN antigo o row movement deve estar activo para as tabelas afectadas.O row_movement_clause serve para permitir à base de dado mover um registo, ou seja, parte do principio que para utilizar o FLASHBACK TABLE é necessário movimentar os registos e a consequência disso é _obviamente_ numa heap table modificar o rowid.


SQL> alter database flashback on;
Database altered.

SQL> create table t1_rowmove as select dbms_random.value(0,100) N1 from dual connect by level <=100000;

Table created.

SQL> !date
Thu Jan 26 15:21:39 WET 2012

Para podermos comparar os rowids aplicados criaremos uma tabela auxiliar que guarda todos os rowids da tabela t1_rowmove:


SQL> create table t1_rowids as select ROWID row_id from t1_rowmove;

Table created.

SQL> select * from t1_rowids where rownum < 5;

ROW_ID
------------------
AAAS8zAAEAAAACrAAA
AAAS8zAAEAAAACrAAB
AAAS8zAAEAAAACrAAC
AAAS8zAAEAAAACrAAD

Vamos agora apagar todos os registos da tabela t1_rowmove com o critério N1 < 50 e seguidamente usaremos o FLASHBACK TABLE para restaurar a tabela antes do DELETE:


SQL> delete from t1_rowmove where n1 < 50;

49898 rows deleted.

SQL> commit;

Commit complete.

SQL> alter table t1_rowmove enable row movement;

Table altered.

SQL> flashback table t1_rowmove o timestamp TO_TIMESTAMP('2012-01-26 15:21:39', 'YYYY-MM-DD hh24:mi:ss');

Flashback complete.

Pois bem, temos a tabela tal e qual a criámos, com o mesmo número de registos e vamos agora averiguar a questão dos ROWIDs. Para tal criaremos uma outra tabela auxiliar que guarda os ROWIDs da tabela “recem restaurada”:


SQL> create table t2_rowids
2 as select ROWID row_id from t1_rowmove;

Table created.

Comparando os ROWIDs presentes nas tabelas t1_rowids vs t2_rowids:


SQL> select count(1) from t1_rowids where row_id not in (select row_id from t2_rowids);

COUNT(1)
----------
50102

SQL> select count(1) from t2_rowids where row_id not in (select row_id from t1_rowids);

COUNT(1)
----------
50102

 

Conclusão é simples, existe row movement e o resultado disso são rowids que mudam apesar dos dados permanecerem intactos com o FLASHBACK. Portanto os ROWIDs não são preservados. Faz também sentido que isto apenas ocorrerá em heap tables sendo que nas IOT os rowids se mantenham intactos.

10053 Parser

Last time i wrote about SYS_DL_CURSOR hint to find out if i can make use of it, but i realized that i rely many times on trace files, mainly 10053 tracefile, so i decided to write a simple parser (in Python) to help me. It’s very simple and for now i will not share source code with you until i have a “good” and readeable version of the code 🙂

 

10053Parser has 2 features for now:

1 – Hints

2 – Explain plans 

 

Feature #1 – Hints

 

10053Parser will (eventually) parse contents to find out how many DML/DDL statements were executed and which hints were used on statements showing the output in a good/fashion way:

 
[oracle@localhost trace]$ ./trace_10053.py --hints testSID_ora_14131_MESSI.trc
Report Hints for [testSID_ora_14131_MESSI.trc] ...
Hint | Used | Error | Level | SQL FULL TEXT
----------------------------------------------------------------------------------------------

N/A | N/A | N/A | N/A | create table t1_abc as select 1 as N1 from dual
SYS_DL_CURSOR () | 0 | 0 | 1 | select /*+ full(a) SYS_DL_CURSOR */ N1 from t1_abc a
FULL ("A") | 1 | 0 | 3 | select /*+ full(a) SYS_DL_CURSOR */ N1 from t1_abc a
INDEX ("A") | 1 | 0 | 3 | select /*+ index(a) */ N1 from t1_abc a

 

 

As you can see, output is easy to read and you have all information regarding hints, even if the statements uses multiple hints.

 

 

Feature #2 – Explain Plans

 

Instead of trying to remember explain plans for every statement, 10053Parser allows you to output the list of explain plans and corresponding statement:

 

[oracle@localhost trace]$ ./trace_10053.py --explain testSID_ora_14131_MESSI.trc
Report Explain for [testSID_ora_14131_MESSI.trc] ...
SQL: create table t1_abc as select 1 as N1 from dual

------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------+-----------------------------------+
| 0 | CREATE TABLE STATEMENT | | | | 3 | |
| 1 | LOAD AS SELECT | | | | | |
| 2 | FAST DUAL | | 1 | | 2 | 00:00:01 |
------------------------------------------+-----------------------------------+

SQL: select /*+ full(a) SYS_DL_CURSOR */ N1 from t1_abc a

-------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 2 | |
| 1 | TABLE ACCESS FULL | T1_ABC | 1 | 13 | 2 | 00:00:01 |
-------------------------------------+-----------------------------------+

SQL: select /*+ index(a) */ N1 from t1_abc a

-------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 2 | |
| 1 | TABLE ACCESS FULL | T1_ABC | 1 | 13 | 2 | 00:00:01 |
-------------------------------------+-----------------------------------+

 

As soon i have the courage i will release source code 🙂

SYS_DL_CURSOR – I can't make it

[English]

Some days ago i came across with a pl/sql code where they used (for some reason) an Oracle hint called SYS_DL_CURSOR. I got it when i was looking into V$SQL and i was not familiar with this, maybe the only thing i (“think”) know about it is that in some conditions it performs a direct path insert, just like append hint. Hint itself seems to be not official documented by Oracle and i figured out that some aplications (Hi Informatica PowerCenter) uses it for some direct path insert ETL.
This is not by any means a “standard” way to do direct path inserts but if it works good, we need to get into it. So i decided to take a more deep look inside this hint to make sure that i understand what it really does.

A good way to verify if your direct path insert got right is try to query the table segment before your COMMIT or ROLLBACK operation. Let’s create a table and try a simple direct path insert:


SQL> create table t1_dpi(n1 number, n2 number, CONSTRAINT pk_n UNIQUE(n1));


SQL> insert /*+ append_values */ into t1_dpi VALUES(1,1);
SQL> insert /*+ append_values */ into t1_dpi VALUES(2,2);


SQL> select * from t1_dpi where rownum < 1;
select * from t1_dpi where rownum < 1
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

 

As you can easily see, we got an ORA-12838 after querying segment when doing a direct path. This happens because the transaction made an attempt to read (or modify) statements on a table and this is not allowed in direct loads. It will prevent data inconsistency [see this].
In this way you ensure that you will use an direct path insert using the common hint /*+ append_values */ (or /*+ append */) however you can use some 10046 trace:


SQL> alter session set events='10046 trace name context forever, level 12';
Session altered.

insert /*+ append_values */ into lcmarques.t1_dpi VALUES(2,2)
END OF STMT
PARSE #3:c=1000,e=926,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3581094869,tim=1324125386379660
WAIT #3: nam='direct path write' ela= 37 file number=4 first dba=10908 block cnt=1 obj#=-1 tim=1324125386380206

 

So as everybody expected append_values hint works as advertised :). Now let’s try SYS_DL_CURSOR

 


SQL> insert /*+ SYS_DL_CURSOR */ into t1_dpi values (8,8);

1 row created.

SQL> select * from t1_dpi where rownum < 1;

no rows selected

 

It seems that we can query the table (no ORA-12838), so probably the HINT is not make direct path insert and Oracle is ignoring it. Let’s check explain plan and 10046 trace to make sure:


--------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------+-----------------------------------+
| 0 | INSERT STATEMENT | | | | 1 | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
--------------------------------------------+-----------------------------------+


insert /*+ SYS_DL_CURSOR */ into t1_dpi values (8,8)
END OF STMT
PARSE #3:c=1000,e=931,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1324126029360833
EXEC #3:c=0,e=169,p=0,cr=1,cu=7,mis=0,r=1,dep=0,og=1,plh=0,tim=1324126029361070
STAT #3 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=0 us)'

 

No luck here! Next option is now use a 10053 to unsure that CBO is not ignoring silently SYS_DL_CURSOR hint. Dumping Hints section shows even invalid/malformed hints here and for some invalid hints it shows err=1 or used=0. On our case used=0 shows that for some reason this hint is not beeing used:

SQL>  alter session set events='10053 trace name context forever, level 1';

Dumping Hints
=============
atom_hint=(@=0x8cd666f0 err=0 resol=0 used=0 token=914 org=1 lvl=1 txt=SYS_DL_CURSOR ())
====================== END SQL Statement Dump ======================

Another ideia might be trying to use sqlldr (i googled some cases) to generate SYS_DL_CURSOR hint for direct path loading, but no luck here too:

[oracle@localhost scripts]$ sqlldr lcmarques/lcmarques@testSID control=loader.ctl log=logfile.log direct=true;
Load completed - logical record count 9.


SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB
) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB)
NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /*+
NO_PARALLEL("T1_DPI") FULL("T1_DPI") NO_PARALLEL_INDEX("T1_DPI") */ :"SYS_B_2"
AS C1, :"SYS_B_3" AS C2 FROM "T1_DPI" "T1_DPI") SAMPLESUB

 

Last thing was to check in v$sql_hint for this hint:


SQL> select class from v$sql_hint where class like 'SYS_DL%';

CLASS
----------------------------------------------------------------
SYS_DL_CURSOR

 

Conclusion is simple, i can’t make it. For some reason Oracle CBO is ignoring SYS_DL_CURSOR. Maybe this is valid under certain circunstances that i really don’t know or is already depreceated for direct path inserts or 11g doesn’t really like it.

Oracle – Concorrência Simplificada

Hoje a matéria é muito simples, é mais uma clarificação “Back to Basics” do que um post elaborado com investigação.
Serve assim de material de base para quando surgirem mais dúvidas deste tipo no meu local de trabalho 😉

A ideia é simples: Aplicar um select _durante_ o update e garantir que existe total consistência de leitura
e que não existe qualquer “bloqueio” em nenhuma sessão dada a arquitectura. Depois do exemplo prático, explicarei como funciona.

SQL> create table t2_locks as select rownum N1, dbms_random.string('A', 10) S1 from dual connect by level <= 1000000;

Table created.


SQL> select sid from v$session where audsid=userenv('SESSIONID');

SID
----------
42


SQL> select sid from v$session where audsid=userenv('SESSIONID');

SID
----------
32

Validações na SID 42:

 

SQL> select MIN(N1) from t2_locks;

MIN(N1)
----------
1


SQL> select MAX(N1) from t2_locks;

MAX(N1)
----------
1000000


SQL> select avg(N1) from t2_locks;

AVG(N1)
----------
500000.5

Estas três validações irão permitir verificar a consistência dos dados durante o update e após o update (sem COMMIT).

Avançaremos então com o update demorado à SID 42, enquanto na SID 32 faremos as validações enquando decorre o update.
Para facilitar a compreensão, farei um pequeno video (desta vez sem som ;)):

O update será o seguinte:

 

 

update t2_locks set N1 = (select MIN(dbms_random.value(0,40000)) from dual connect by level <= 2000000);


Como puderam ver no vídeo e apesar de não ter mostrado os locks feitos na v$lock durante o processo todo, não existe qualquer interferência das sessões entre elas, pois o Oracle possui um mecanismo de consistência de leitura que faz com que o resultado da query venha apenas de um ponto no tempo (na altura que a query começa) para uma ou mais transacções. Quando acontece o update (ou qualquer DML), o Oracle usa os rollback segments para providenciar dados consistentes, ou seja, os rollback segments contem os valores antigos dos dados que vão sendo sujeitos ao update (ou qualquer DML). O select poderá ler parte dos dados do table segment (da tabela em si) e outra parte dos rollback segment enquanto decorre o update ou este não está “commited”. É basicamente um snapshot dos data blocks antes da sua modificação. Assim que é feito o commit e executado um select, os data blocks presentes no rollback segments são descartados e os dados do select virão exclusivamente do table segment (ou index segment, se for o caso).

É este rollback segment que permite que aquando de um rollback exista o processo de reversão dos dados sujeitos ao DML.
Explica também porque é que um rollback em média demora o mesmo tempo que a própria operação de DML que o originou, ou seja,
o Oracle tem que ir ao rollback segment lê-lo e repor tudo como estava, ou seja, a operação inversa aquando do DML a nível de
segments.

 

PS: Acabei por fazer metade em texto, metade em vídeo 🙂

Oracle Locks – Parte I – "Isso já eu sabia"

Desde o inicio que qualquer Oracle implementa mecanismos de lock para lidar com problemas de concorrência e de consistência na Base de Dados. Aceder a dados simultâneamente enquanto é possível dar a outras sessões dados  consistentes, permitindo coerência na leitura e escrita dos dados permanece a função mais importante de uma Base de Dados. Sem estar a querer entrar pelo mundo teórico do modelo ACID, vou apenas demonstrar alguns pontos pertinentes dos mecanismos de Lock no Oracle.

Vou usar 2 sessões e analisar os respectivos locks à medida das operações que vou fazendo. Para facilitar a compreensão vou também explicando:

SQL> create table t1_locks as select rownum as N1, dbms_random.string('A',2) S1 from dual connect by level <= 5;

Table created.

SQL> select sid from v$session where audsid=userenv('SESSIONID');

SID
----------
40

SQL> select sid from v$session where audsid=userenv('SESSIONID');

SID
----------

29

SQL> column s1 format a3;
SQL> select * from t1_locks;

N1 S1
---------- ---
1 PN
2 fP
3 Yo
4 Ur
5 KX

 

Na SID 40 será feito um insert:

SQL> insert into t1_locks (N1,S1) values(6,'SL');

1 row created.

Sem o commit executado veremos o que vê a sessão com o ID 29:

 

SQL> select * from t1_locks;

N1 S1
———- —
1 PN
2 fP
3 Yo
4 Ur
5 KX

 

Naturalmente a SID 29 não verá as alterações feitas pela SID 40, pois não existe commit. Veremos o que se passou
no Oracle ao nível do mecanismo de locking:


SQL> select sid,type,id1,lmode,request from v$lock where sid in (40,29);

SID TY ID1 LMODE REQUEST
---------- -- ---------- ---------- ----------
40 AE 100 4 0
29 AE 100 4 0
40 TM 74204 3 0
40 TX 458771 6 0

 

Os “AE” são para desprezar (significa basicamente Application Edition lock, 11g apenas), mas os outros não. “TX” representa Transaction Lock, ou Transaction Enqueue e “TM” significa DML ou Table lock (DML Enqueue), ou seja, o TX  representa a transação, é usado principalmente para prevenir que uma outra transacção modifique o mesmo registo, assim cada vez que um transacção necessita de modificar um registo adquire um TX. O “TM” é usado principalmente para gerir mecanismos de concorrência de operações DDL, como por exemplo, tentar fazer “drop” a uma tabela durante uma operação de insert (ou outro qualquer DML)
A coluna LMODE representa o modo de lock.
Na “TM” o modo de lock será row-exclusive (row-X, mode 3), ou seja, apenas os registos inseridos, enquanto do ponto de vista da “TX” o modo de lock será exclusive, já que o “TX” se refere à propria transacção.

Podemos de uma forma simples verificar se o lock “TM” está aplicado na tabela:


SQL> select name from sys.obj$ where obj#=74204;

NAME
------------------------------
T1_LOCKS

Confirma-se que existe um lock no objecto T1_LOCKS.
Já vimos que a SID 29 não consegue ver as alterações feitas pela SID 40, pois não houve lugar a commit.
O próximo passo é inserir um registo com a SID 29 e fazer update a um já existente (N1=1). Vejamos:

 


SQL> insert into t1_locks (N1,S1) values(7,'XX');

1 row created.


SQL> update t1_locks set S1='BB' where N1=1;

1 row updated.

Não será feito commit. Analisaremos os locks agora referentes às duas sessões:

 

SQL>  select sid,type,id1,lmode,request from v$lock where sid in (40,29);

SID TY ID1 LMODE REQUEST
---------- -- ---------- ---------- ----------

40 TM 74204 3 0
29 TM 74204 3 0
29 TX 393239 6 0
40 TX 458771 6 0

 

Tudo faz sentido, 1 lock de cada tipo para cada SID diferente. As SID 29 e 40 estão a fazer “Table Lock” com row-S e ambas estão a trazer “Transaction Lock” para os registos inseridos e modificados. Até agora cada sessão consegue apenas visualizar aquilo que inseriu e/ou modificou. Vejamos:

SID 40:


SQL> select * from t1_locks; [Inclui INSERT do N1=6]

N1 S1
---------- ---
1 PN
2 fP
3 Yo
4 Ur
5 KX
6 SL

 

SID 29:


SQL> select * from t1_locks; [Inclui INSERT do N1 = 7 e UPDATE do N1=1]

N1 S1
---------- ---
1 BB
2 fP
3 Yo
4 Ur
5 KX
7 XX

 

Vamos piorar as coisas. Vamos na SID 40 tentar fazer update na tabela para N1=1, tal e qual fizemos na SID 29:


SQL> update t1_locks set S1='CC' where N1=1;
 

A sessão está bloqueada, não existiu lugar a update, até que a SID 29 faça commit/rollback das alterações. Podemos ver isto em “directo”:

SQL>  select event, seconds_in_wait, sid from v$session_wait where sid in (40,29);

EVENT SECONDS_IN_WAIT SID
-------------------- --------------- ----------
SQL*Net message from 246 29
client

enq: TX - row lock c 105 40
ontention

A coluna SECONDS_IN_WAIT permite-nos saber À quanto tempo o lock espera, ou seja, o evento “row lock contention” não é mais que uma sessão que espera por um row lock feito por outra sessão. No nosso caso o modo 6 (coluna LMODE) indica mesmo que para resolver este problema deverá ser feito um rollback ou commit dos dados. Na SID 29 faremos o commit:


SQL> commit;

Commit complete.

 

Entretanto na SID 40:


SQL> update t1_locks set S1='CC' where N1=1;

1 row updated.

 

Faremos agora um select * a ambas as tabelas:

SID 40:

SQL> select * from t1_locks;

N1 S1
---------- ---
1 CC
2 fP
3 Yo
4 Ur
5 KX
7 XX
6 SL

7 rows selected.

SID 29:


SQL> select * from t1_locks;

N1 S1
---------- ---
1 BB
2 fP
3 Yo
4 Ur
5 KX
7 XX

O resultado é simples de interpretar. Enquanto que a SID 40, vê as alterações efectuadas pela SID 29, o contrário não acontece pois ainda não foi feito qualquer commit na SID 40. De notar que o valor para N1=1 é diferente em ambas as sessões já que na SID 40 foi feito o update para CC mas sem commit, sendo isto apenas visto pela SID 40 e não pela SID 29. De notar ainda que a SID 40 contêm um registo [N1=7] que foi inserido pela SID 29. Para terminar este exemplo bastante simples vamos ver de novo o estado dos locks:


SQL> select sid,type,id1,lmode,request from v$lock where sid in (40,29);

SID TY ID1 LMODE REQUEST
---------- -- ---------- ---------- ----------
40 TM 74204 3 0
40 TX 458771 6 0

Portanto apenas a SID 40 ainda detêm locks pois não foi feito qualquer rollback ou commit.

Leitura de índices B-Tree: Alteração do Clustering Factor – Parte 2 (Reverse Key Index)

Tinha mostrado anteriormente que apenas reorganizando a tabela seria possível alterar o valor do CF e que  uma reorganização do índice nada poderia fazer, pois o CF depende directamente da desordem dos blocos na tabela comparada com a organização no índice. No entanto existe uma forma relativamente simples de alterar o valor do CF que é usando reverse key index (não sei o termo em português).

Existem alguns casos onde este tipo de índices é útil, que foram desenhados essencialmente para resolver o problema de contenção nos blocos (index block contention). Basicamente ocorrem em cenários de muita concorrência (DML insert, update ou delete) onde as várias sessões concorrentes precisam de aceder ao mesmo bloco (chamado “hot block”) gerando assim contenção ao nível do bloco do índice causando inumeros wait events do tipo “buffer busy waits”, por exemplo.

Mas o post não é sobre os Reverse Key Index, mas sim sobre o CF e como este valor pode ser alterado usando um índice reverted. A alteração da ordem no índice leva a que a desordem na tabela seja diferente. Como exemplo simples, se um ID para inserir na tabela for gerado como 112233 será inserido como 332211 no índice. Este tipo de “reverse” permite que os inserts sejam espalhados por toda a estrutura do índice, evitando a contenção em apenas um só leaf bloco (o mais há direita). Com isto fazemos com que as entradas no índice deixem de estar ordenadas da forma natural e como conhecemos, ou seja, o 112234 a seguir ao 112233.
Apesar de parecer resolver alguns problemas nomeadamente em ambientes RAC muito concorridos, cria uma outra panóplia de problemas que não discutiremos neste post.

O código seguinte mostrará uma tabela, um índice normal que depois será convertido para “reverse” e os respectivos
valores do CF após cada etapa:

 


SQL> create table t_cf2 as select ceil(dbms_random.value(0,100000)) N1 from dual connect by level <= 100000;

Table created.


SQL> create index i_cf2 on t_cf2(N1);

Index created.


SQL> exec dbms_stats.gather_table_stats(null,'T_CF2', cascade=>TRUE);

PL/SQL procedure successfully completed.


SQL> select blevel, leaf_blocks, clustering_factor from user_indexes where index_name = 'I_CF2';

BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
1 222 99270

SQL> alter index i_cf2 rebuild reverse;

Index altered.


SQL> select blevel, leaf_blocks, clustering_factor from user_indexes where index_name = 'I_CF2';

BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
1 222 99265

Temos uma valor diferente no CF, sendo que a diferença é pouca o que significa que o meu exemplo aqui não foi o melhor, no entanto, como nota final não devem de forma alguma usar os RKI sem cuidado, pois a ordem das chaves deixa de ser a natural e os “range scans” deixam de ser possíveis (predicados como BETWEEN, LIKE, > <) e o CBO vai por completo ignorar este tipo de índices. Estes dois posts sobre o CF foram apenas para fazer entender como o CF varia em função das várias ordens seja na tabela ou no índice.