1 – Introduction
We all know the possible benefits of using Huge Pages (or Large Pages, depending of the platform). Is is simply a method to have larger page size and that is useful for working with very large memory allocations. As you already know how things flow in Linux Kernel i will just give you a brief how things go in another operating system: AIX
The ideia behind is exactly the same as any other operating system that supports different pages sizes, but AIX (AIX 6.1 and Power6+) added one feature called: Dynamic Variable Page Size Support (VPSS). As you know pages are simply fixed length data blocks in virtual memory, so AIX VMM can dynamically use larger page sizes based on the “application memory workload”. The ideia behind is very good as the feature is transparent to applications, reduces the number hardware address translations done and the effort to implement it is close to nothing.
However, POWER6+ processors only supports mixing 4 KB and 64 KB page sizes, not 16M or other sizes (16G is also available).
To know the different page sizes available for your AIX system:
$ pagesize -a 4096 (4K) 65536 (64K) 16777216 (16M) 17179869184 (16G)
2 – The basics
To exemplify this let’s pick a POWER6 CPU running AIX 6.1 and Oracle 11.2.0.2 (RAC):
$ lsattr -El proc0 frequency 4400000000 Processor Speed False smt_enabled true Processor SMT enabled False smt_threads 2 Processor SMT threads False state enable Processor state False type PowerPC_POWER6 Processor type False
This first thing to make sure is what page sizes are currently in use. vmstat will output 2 different page sizes in use: 4K and 64K .
$ vmstat -P all System configuration: mem=49152MB pgsz memory page ----- -------------------------- ------------------------------------ siz avm fre re pi po fr sr cy 4K 3203552 2286288 324060 0 0 0 55 227 0 64K 586210 605917 1180 0 0 0 0 0 0
Let’s now see what pagesize is used by Oracle. We will use svmon that captures and analysis the virtual memory allocation. The option “-P” will allow us to ask only for PID associated with SMON background process.
$ svmon -P $(ps -elf | egrep ora_smon_${ORACLE_SID} | grep -v egrep | awk '{print $4}') | grep shmat 14c8d31 7000000d work default shmat/mmap m 4096 0 4 4096 147cf12 70000011 work default shmat/mmap m 4096 0 655 4096 15a1164 70000042 work default shmat/mmap m 4096 0 1732 4096 13648df 70000051 work default shmat/mmap m 4096 0 1452 4096 145cd1b 70000018 work default shmat/mmap m 4096 0 880 4096 1468717 70000006 work default shmat/mmap m 4096 0 739 4096 168ffae 70000038 work default shmat/mmap m 4096 0 1131 4096 17323cf 70000048 work default shmat/mmap m 4096 0 601 4096 15a1b64 70000039 work default shmat/mmap m 4096 0 1015 4096 13862ec 7000004f work default shmat/mmap m 4096 0 836 4096 1664798 7000001d work default shmat/mmap m 4096 0 1713 4096 115ec5a 7000002d work default shmat/mmap m 4096 0 1474 4096 13dbafb 70000058 work default shmat/mmap m 4096 0 1271 4096 1221a85 70000052 work default shmat/mmap m 4096 0 1341 4096 1753fd9 7000003f work default shmat/mmap m 4096 0 728 4096
The column “m” shows that Oracle is asking for “medium size” pages (64K) only.
A better way of checking this is using this following options in svmon.
(1306770 is the PID of smon)
$ svmon -P 1306770 -O sortentity=pgsp,unit=auto,pgsz=on Unit: auto ------------------------------------------------------------------------------- Pid Command Inuse Pin Pgsp Virtual 1306770 oracle 27,2G 99,4M 7,81G 28,1G PageSize Inuse Pin Pgsp Virtual s 4 KB 275,38M 0K 48,7M 60,2M m 64 KB 26,9G 99,4M 7,76G 28,0G -------------------------------------------------------------------------------
SQL> select sum(BYTES)/1024/1024/1024 as SGA_SIZE from V$SGASTAT; SGA_SIZE ---------- 28,3113727
As you can see the value allocated to “Virtual Memory” (column Virtual) for page size of 64K is 28,0G and should be very similar to the SGA allocated to Oracle.
64K page size also reduce TLB miss rate and it is clear that will benefit the performance when using it instead of the typical 4KB pages. Besides this, no configuration is required at all as AIX kernel automatically allocates 64KB pages for shared memory (SGA) regions (and process data and instruction text if requested).
Another thing that you should know and you see in it column “Pin” in svmon command is that pages of 64K (or 4K) are not pinned. This is due to:
– Your Oracle LOCK_SGA parameter is set to FALSE, so that your SGA is not locked into physical memory.
– Is is recommended to not pin pages of regular size (4K or 64K) due complex nature of pinning that can cause serious problems. Pinning SGA to physical memory only with 16M page size (or 16GB?)
3 – The 16M pages question:
Every time you use large pages to map virtual memory, TLB is able to map more virtual memory entries with a much lower TLB miss rate. This is why Large Pages (or Huge Pages on Linux) are a common topic and a best practice on both OLTP or DW environments.
There are several documents that say to forget 16M and just use the regular 64K and let VPSS take care of promoting the page size that is asked for the application (Oracle in this case).So as you image the question is: Is 16M page size a benefit on Logical I/O performance? Let’s SLOB it!!
4 – SLOB for 64K vs 16M page size
To test, i’ve decided to use SLOB workload and measure the impact on CPU and LIO of 64k page size first. SLOB is an very useful utility written by Kevin Closson (http://kevinclosson.net/slob/) that can really be used in a various number of scenarios including PIO and LIO testing as well as CPU analysis (in the end everything is a CPU problem ;).
The hardware lab includes POWER7 processors and AIX 6.1 (6100-08-02-1316) with a CPU_COUNT=24.
4.1 – The setup
1 – Create a new database called SLOB (or use SLOB create_database_kit).
This step includes a few interesting details about the database, mainly the SGA size of 30G and a db_cache_size set by default to be managed by Oracle. That will be very handy to test the impact of allocating different page sizes.
*.sga_max_size=32212254720 *.sga_target=32212254720 *.pga_aggregate_target=4294967296 *.processes=1500
2 – Setup SQL*Net connectivity – Remember that we are running SLOB inside Linux that connects to a AIX database system.
# Settings for SQL*Net connectivity: ADMIN_SQLNET_SERVICE=SLOB SQLNET_SERVICE_BASE=SLOB #SQLNET_SERVICE_MAX=2 SYSDBA_PASSWD=oracle
3 – Confirming that Oracle is “asking” AIX kernel 64k pages. Check column “m” that stands for medium pages size (64k).
$ svmon -P $(ps -elf | egrep ora_smon_SLOB | grep -v egrep | awk '{print $4}') | grep shmat ac40ac 7000007c work default shmat/mmap m 4096 0 0 4096 8adf8a 7000006a work default shmat/mmap m 4096 0 0 4096 8fe14f 70000063 work default shmat/mmap m 4096 0 0 4096 b152b1 70000047 work default shmat/mmap m 4096 0 0 4096
Testcases rules:
– Oracle 11.2.0.2 was used (The version that i really need to test)
– Buffer Hit should be always 100% on Instance Efficiency AWR section
– The tests on each page size includes waiting 10 minutes between each page size run
– The following order was used for each testcase:
For 64k page size:
0 – Reboot server
1 – Run SLOB: ./runit.sh 20 to populate Oracle buffer cache
2 – Run SLOB: ./runit.sh 20 and save AWR data.
3 – Run SLOB: ./runit.sh 20 and save AWR data.
4 – Run SLOB: ./runit.sh 20 and save AWR data.
For 16M page size:
5 – Reboot server to free continuous 16M memory chuncks
6 – Setup Large Pages (16M) for Oracle in AIX
7 – Run SLOB: ./runit.sh 20 to populate Oracle buffer cache
8 – Run SLOB: ./runit.sh 20 and save AWR data.
9 – Run SLOB: ./runit.sh 20 and save AWR data.
10 – Run SLOB: ./runit.sh 20 and save AWR data.
Testcase #1 – LIO in 64k vs 16M page size with small dataset
The first test pretends to compare the use of medium pages of 64K and large pages (16M) and the impact on Logical I/O using a small dataset size (Aprox. 7GB). Please note that the test doesn’t pretend to know the “maximum value” for your Logical I/O, but instead compare 3 “equal” runs of SLOB results in different page size. As this test will run only with 20 active sessions, the CPUs are not totally busy as idle time is also present (CPU_COUNT=24).
Testcase #1 – slob.conf
UPDATE_PCT=0 RUN_TIME=300 WORK_LOOP=0 SCALE=50000 WORK_UNIT=256 REDO_STRESS=LITE LOAD_PARALLEL_DEGREE=4 SHARED_DATA_MODULUS=0
Testcase #1 – SLOB Dataset (20 schemas)
$ ./setup.sh IOPS 20 NOTIFY : 2015.03.26-16:04:43 : NOTIFY : 2015.03.26-16:04:43 : Begin SLOB setup. Checking configuration. ... NOTIFY : 2015.03.26-16:07:32 : SLOB setup complete (169 seconds).
Testcase #1 – Run it (64k vs 16M)
As rule for the runs, i’ve decided to populate the buffer cache to get Buffer Hit of 100%, avoiding interference from Physical I/O. The will happen only after the 2nd run of SLOB as the first run will partly serve as buffer cache warm-up.
$ ./runit.sh 20 NOTIFY : 2015.04.01-14:04:39 : NOTIFY : 2015.04.01-14:04:39 : Conducting SLOB pre-test checks. NOTIFY : 2015.04.01-14:04:39 : All SLOB sessions will connect to SLOB via SQL*Net ... NOTIFY : 2015.04.01-14:10:02 : Terminating background data collectors. ./runit.sh: line 589: 24771 Killed ( iostat -xm 3 > iostat.out 2>&1 ) ./runit.sh: line 590: 24772 Killed ( vmstat 3 > vmstat.out 2>&1 ) ./runit.sh: line 590: 24773 Killed ( mpstat -P ALL 3 > mpstat.out 2>&1 ) NOTIFY : 2015.04.01-14:10:12 : SLOB test is complete.
# vmo -p -o lgpg_regions=1921 -o lgpg_size=16777216 ... Setting lgpg_size to 16777216 Setting lgpg_regions to 1921 $ export ORACLE_SGA_PGSZ=16m $ svmon -P $(ps -elf | egrep "ora_smon_SLOB" | grep -v egrep | awk '{print $4}') | grep shmat 8f0a4f 7000005c work default shmat/mmap L 16 16 0 16 8e0a4e 70000061 work default shmat/mmap L 16 16 0 16 bb077b 7000002e work default shmat/mmap L 16 16 0 16 ad072d 7000002b work default shmat/mmap L 16 16 0 16 b60836 7000002d work default shmat/mmap L 16 16 0 16 8d0a4d 7000005e work default shmat/mmap L 16 16 0 16 bf097f 7000000a work default shmat/mmap L 16 16 0 16 9d065d 70000029 work default shmat/mmap L 16 16 0 16 be097e 70000002 work default shmat/mmap L 16 16 0 16 bd097d 70000010 work default shmat/mmap L 16 16 0 16
The column with value L, shows that Large Pages of 16M are actually being used by Oracle.
Testcase #1 – The Results
Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 100.00 In-memory Sort %: 100.00 Library Hit %: 101.20 Soft Parse %: 94.83 Execute to Parse %: 99.98 Latch Hit %: 100.00 Parse CPU to Parse Elapsd %: 0.00 % Non-Parse CPU: 100.00
In average of 3 different runs with different page sizes, 16M page size shows a very small improvement on less than 2% (1,9%).
Testcase #2 – LIO in 64k vs 16M page size with larger SLOB dataset
The same rules applies to this test case, the only difference is the SCALE used in SLOB as it is bigger but still be able to fit in Oracle buffer cache. As every block still comes from Oracle buffer cache the OS needs to know the status of every page allocated by Oracle SGA and make the so called address translation. On this test case SLOB ran with 20 sessions just like on the first test case.
Testcase #2 – The bigger SLOB SCALE run
This will result in about 23G of data with 20 different schemas
UPDATE_PCT=0 RUN_TIME=300 WORK_LOOP=0 SCALE=150000 WORK_UNIT=256 REDO_STRESS=LITE LOAD_PARALLEL_DEGREE=4 SHARED_DATA_MODULUS=0
$ ./runit.sh 20 NOTIFY : 2015.04.01-14:04:39 : NOTIFY : 2015.04.01-14:04:39 : Conducting SLOB pre-test checks. NOTIFY : 2015.04.01-14:04:39 : All SLOB sessions will connect to SLOB via SQL*Net ... NOTIFY : 2015.04.01-14:10:02 : Terminating background data collectors. ./runit.sh: line 589: 24771 Killed ( iostat -xm 3 > iostat.out 2>&1 ) ./runit.sh: line 590: 24772 Killed ( vmstat 3 > vmstat.out 2>&1 ) ./runit.sh: line 590: 24773 Killed ( mpstat -P ALL 3 > mpstat.out 2>&1 ) NOTIFY : 2015.04.01-14:10:12 : SLOB test is complete.
Testcase #2 – The Results
In average of 3 different runs, it represents no improvement using 16M page size with a difference of less than 1% between the 2 different page sizes. For this SLOB workload, we can conclude that 64k and 16M page size showed the same results.
Testcase #3 – LIO in 64k vs 16M page size with CPU pressure
The same rules applies to this test case as other two, as the Buffer Hit should be 100%. But this time we run under CPU starvation with 40 concurrent sessions on a CPU_COUNT=24. To make sure that all blocks come from Oracle buffer cache the SLOB SCALE was reduced to 80000.
Also, the overhead to managing large pages tables is indicated by CPU usage (mostly). This increase on kernel mode CPU usage will eventually make suffer your Logical I/O numbers. So CPU is wasted in managing big page tables translations instead of giving the CPU to Oracle to process your workload. The overhead of CPU usage and the “sometimes happen” page faults will lead to a less-than-good Oracle performance. The bottom line here is simple: 16M page size should provide better results by theory under CPU pressure.
Testcase #3 – CPU pressure and starvation
$ ./runit.sh 40 NOTIFY : 2015.04.02-16:44:34 : NOTIFY : 2015.04.02-16:44:34 : Conducting SLOB pre-test checks. NOTIFY : 2015.04.02-16:44:34 : All SLOB sessions will connect to SLOB via SQL*Net NOTIFY: UPDATE_PCT == 0 RUN_TIME == 300 WORK_LOOP == 0 SCALE == 80000 WORK_UNIT == 256 ADMIN_SQLNET_SERVICE == "SLOB" SQLNET_SERVICE_MAX == "0" ...
Testcase #3 – The Results
Looks good! A difference on average of 3 runs of more than 11% in favor of 16M page size. This shows that under CPU pressure and possible starvation you will end up with more benefits than the work that is required to setup 16M page size on AIX 6/7.
To make sure that these results were ok to publish, i’ve done numerous SLOB runs with 64k and 16M page size and the results were the same. Benefits between 9% and 12% with CPU under a lot of pressure.
Conclusion
– 16M page size on AIX (and on other OS, probably) will provide you better Logical I/O performance when CPUs are under pressure. The benefits range between 9% and 12% when using 16M page size.
– These results may differ from your conclusions or tests, because as you understand your workload is different and the results will be inevitably be different.