Monday 22 May 2017

VMM concepts

Virtual-memory segments are partitioned in units called pages; each page is either located in real physical memory (RAM) or stored on disk until it is needed. AIX uses virtual memory to address more memory than is physically available in the system. The management of memory pages in RAM or on disk is handled by the VMM.
A page is a fixed-size block of data (usually 4096 byte). A page might be resident in memory (that is, mapped into a location in physical memory), or a page might be resident on a disk (that is, paged out of physical memory into paging space or a file system).
The VMM maintains a free list of available page frames. The VMM also uses a page-replacement algorithm to determine which virtual-memory pages currently in RAM will have their page frames reassigned to the free list.
AIX tries to use all of RAM all of the time, except for a small amount which it maintains on the free list. To maintain this small amount of unallocated pages the VMM uses page outs and page steals to free up space and reassign those page frames to the free list.
overhead             -- The load that AIX incurs while sharing resources between user processes and performing its internal accounting.
page                 -- A fixed-size (4KB) block of memory.
page fault           -- It occurs when a process tries to access an address in virt mem. that does not have a location in physical memory.
                        In response, the system tries to load the appropriate data from the hard disk
page stealing daemon -- The daemon responsible for releasing pages of memory for use by other processes
                        (It makes room for incoming pages, by swapping out mem. pages that are not the part of the working set of a process.)
paging in            -- Reading pages from swap.
paging out           -- Releasing pages of physical memory for use.
Kernel continuously checks to see if the number of pages on the free list is below a threshold. If so the page stealing daemon, becomes active and begins copying pages to the swap area, starting with least recently used pages. Each page placed on the free list then becomes available for use by other processes. Pages written out to swap must be read back into physical memory when the process needs them again.
The AIX VMM integrates cached file data with the management of other types of virtual memory (for example, process data, process stack, and so forth). It caches the file data as pages, just like virtual memory for processes.
(In most modern computer systems, each thread has a reserved region of memory referred to as its stack.)
------------------

Working Storage

Working storage pages are pages that contain volatile data (in other words, data that is not preserved across a reboot).
Examples of virtual memory regions that consist of working storage pages are:
    - Process data
    - Stack
    - Shared memory
    - Kernel data
When modified working storage pages need to be paged out (moved from memory to the disk), they are written to paging space. Working storage pages are never written to a file system.
When a process exits, the system releases all of its private working storage pages. Thus, the system releases the working storage pages for the data of a process and stack when the process exits.

Permanent Storage

Permanent storage pages are pages that contain permanent data (that is, data that is preserved across a reboot). This permanent data is just file data. So, permanent storage pages are basically just pieces of files cached in memory.
When a modified permanent storage page needs to be paged out (moved from memory to disk), it is written to a file system.
You can divide permanent storage pages into two sub-types:
    - Non-client pages (aka persistent pages): these are pages containing cached Journaled File System (JFS) file data
    - Client pages: These are pages containing cached data for all other file systems (for example, JFS2 and Network File System (NFS)
------------------
In order to help optimize which pages are selected for replacement by the page replacement daemons, AIX classifies pages into one of two types:
    - Computational pages: pages used for the text, data, stack, and shared memory of a process
    - Non-computational pages: pages containing file data for files that are being read and written.
All working storage pages are computational. A working storage page is never marked as non-computational.
Depending on how you use the permanent storage pages, the pages can be computational or non-computational. If a file contains executable text for a process, the system treats the file as computational and marks all of the permanent storage pages in the file as computational. If the file does not contain executable text, the system treats the file as non-computational file and marks all of the pages in the file as non-computational.
Once a file has been marked as computational, it remains marked as a computational file until the file is deleted (or the system is rebooted). Thus, a file remains marked as computational even after it is moved or renamed.
------------------

Page replacement

The AIX page replacement daemons scan memory a page at a time to find pages to evict in order to free up memory. The page replacement daemons must choose pages carefully to minimize the performance impact of paging on the system, and the page replacement daemons target pages of different classes based on tunable parameter settings and system conditions.
There are a number of tunable parameters that you can use to control how AIX selects pages to replace.
------------------

minperm and maxperm

The two most basic page replacement tunable parameters are minperm and maxperm. These tunable parameters are used to indicate how much memory the AIX kernel should use to cache non-computational pages. The maxperm tunable parameter indicates the maximum amount of memory that should be used to cache non-computational pages. The minperm limit indicates the target minimum amount of memory that should be used for non-computational pages.
By default, maxperm is an "un-strict" limit, so it allows more non-computational files to be cached in memory when there is available free memory. The maxperm limit can be made a "strict" limit by setting the strict_maxperm tunable parameter to 1.
(The disadvantage of this is, that the number of non-computational pages cannot grow beyond maxperm and consume more memory when there is free memory on the system.)

numperm (lru_file_repage)

The number of non-computational pages is referred to as numperm: The vmstat -v command displays the numperm value for a system as a percentage of a system’s real memory.
When the number of non-computational pages (numperm) is greater than or equal to maxperm, the AIX page replacement daemons strictly target non-computational pages (for example, cached files that are not executables).
When the number of non-computational pages (numperm) is less than or equal to minperm, the AIX page replacement daemons target both computational and non-computational pages. In this case, AIX scans both classes of pages and evicts the least recently used pages.
When the number of non-computational pages (numperm) is between minperm and maxperm, the lru_file_repage (least recently used) tunable parameter controls what kind of pages the AIX page replacement daemons should steal.
Thus, the lru_file_repage tunable parameter can be set to 0. In this case, the AIX kernel always targets non-computational pages when numperm is between minperm and maxperm.
In most customer environments, it is most optimal to just have the kernel always target non-computational pages, because paging computational pages (for example, a process’s stack, data, and so forth) usually has a much higher performance cost on a process than paging non-computational pages (that is, data file cache). Thus, the lru_file_repage tunable parameter can be set to 0. In this case, the AIX kernel always targets non-computational pages when numperm is between minperm and maxperm
------

maxclient

The maxclient tunable parameter specifies a limit on the maximum amount of memory that should be used to cache non-computational client pages. Because all non-computational client pages are a subset of the total number of non-computational permanent storage pages, the maxclient limit must always be less than or equal to the maxperm limit.

numclient

The number of non-computational client pages is referred to as numclient. The vmstat -v command displays the numclient value for a system as a percentage of a system’s real memory.
By default, the maxclient limit is a strict limit. This means that the AIX kernel does not allow the non-computational client file cache to exceed the maxclient limit (that is, the AIX kernel does not allow numclient to exceed maxclient). When numclient reaches the maxclient limit, the AIX page replacement daemons strictly target client pages.
------

minfree, maxfree

Two other important parameters are minfree and maxfree. If the number of pages on your free list (vmstat -v: free pages) falls below the minfree parameter, VMM starts to steal pages (just to add to the free list), which is not good. It continues to do this until the free list has at least the number of pages in the maxfree parameter.
# vmstat -v                <-- for non-computational file-cache
       4980736 memory pages
        739175 lruable pages
        432957 free pages
             1 memory pools
         84650 pinned pages
          80.0 maxpin percentage
          20.0 minperm percentage      <<- system’s minperm% setting
          80.0 maxperm percentage      <<- system’s maxperm% setting
           2.2 numperm percentage      <<- % of memory containing non-comp. pages
         16529 file pages              <<- # of non-comp. pages
           0.0 compressed percentage
             0 compressed pages
           2.2 numclient percentage    <<- % of memory containing non-comp. client pages
          80.0 maxclient percentage    <<- system’s maxclient% setting
         16503 client pages            <<- # of client pages
So, in the above example, there are 16529 non-computational file pages mapped into memory. These non-computational pages consume 2.2 percent of memory. Of these 16529 non-computational file pages, 16503 of them are client pages.
The vmstat output does not provide information about computational file pages. Information about computational file pages can be gathered from the svmon command
# svmon -G                <--in memory pages of each type (work, pers., client)
               size      inuse       free        pin    virtual
memory       786432     209710     576722     133537     188426
pg space     131072       1121
               work       pers       clnt
pin          133537          0          0
in use       188426          0      21284
    - work: working storage
    - pers: persistent storage (persistent storage pages are non-client pages - that is, JFS pages.)
    - clnt: client storage
For each page type, svmon displays two rows:
    - in use: number of 4K pages mapped into memory
    - pin: number of 4K pages mapped into memory and pinned (pin is a subset of inuse)
So, in the above example, there are 188426 working storage pages mapped into memory. Of those 188426 working storage pages, 133537 of them are pinned (that is, can’t be paged out).
There are no persistent storage pages (because there are no JFS filesystems in use on the system). There are 21284 client storage pages, and none of them are pinned.
The svmon command does not display the number of permanent storage pages, but it can be calculated from the svmon output. As mentioned earlier, the number of permanent storage pages is the sum of the number of persistent storage pages and the number of client storage pages. So, in the above example, there are a total of 21284 permanent storage pages on the system:
0 persistent storage pages + 21284 client storage pages = 21284 permanent storage pages
The type of information reported by svmon is slightly different than vmstat. svmon  reports information about the number of in-memory pages of different types: working, persistent (that is, non-client), and client. svmon does not report information about computational versus non-computational. svmon just reports the total number of in-memory pages of each page type.
In contrast, vmstat reports information about non-computational versus computational pages.
To illustrate this difference, consider the above example of svmon output. Some of the 21284 client pages will be computational, and the rest of the 21284 client pages will be non-computational. To determine the breakdown of these client pages between computational and non-computational, use the vmstat command to determine how many of the 21284 client pages are non-computational.
-----------
suggested:
lru_file_repage = 0
maxperm = 90%
maxclient = 90%
minperm = 3%
strict_maxclient = 1 (default)
strict_maxperm = 0 (default)
# vmo -p -o lru_file_repage=0 -o maxclient%=90 -o maxperm%=90 -o minperm%=3
# vmo -p -o strict_maxclient=1 -o strict_maxperm=0
The above tunable parameters settings are the default settings for AIX Version 6.1.
-----------------------------
minfree: Minimum acceptable number of real-memory page frames in the free list. When the size of the free list falls below this number, the VMM begins stealing pages. It continues stealing pages until the size of the free list reaches maxfree.
-----------------------------
An example:
topas:
 MEMORY
 Real,MB   26623
 % Comp     57          <--this is used for processes (OS+appl.), if you add nmon Process+System, for me it was the same (46+11)
 % Noncomp  22          <--fs cache
 % Client   22          <--fs cache (for jfs2)
nmon:
 FileSystemCache
 (numperm) 22.5%        <--this is for fs cache
 Process   46.0%        <--this is for appl. processes
 System    11.3%        <--this is for the OS
 Free      20.2%        <--free
           -----
 Total    100.0%
-----------------------------
Excerpts from a tuning docs:
Set vmo:lru_file_repage=0; default=1  # Mandatory critical change
This change directs lrud to steal only JFS/JFS2 file-buffer pages unless/until numperm/numclient is less-than/equal-to vmo:minperm%, at which point lrud begins stealing both JFS/JFS2 file-buffer pages and computational memory pages.
Essentially stealing computational memory invokes pagingspace-pageouts.
I have found this change already made by most AIX 5.3 customers.
Set vmo:page_steal_method=1; default=0  # helpful, not critical
This change switches the lrud page-stealing algorithm from a physical memory address page-scanning method (=0) to a List-based page-scanning method (=1).
Set ioo:sync_release_ilock=1; default=0  # helpful, not critical
Default value =0 means that the i-node lock is held while all dirty pages of a file are flushed; thus, I/O to a file is blocked when the syncd daemon is running. Setting =1 will cause a sync() to flush all I/O to a file without holding the i-node lock, and then use the  i-node lock to do the commit.
Execute vmstat -v and compare the following values/settings:
minperm    should be 10, 5 or 3; default=20  
maxperm    should be 80 or higher; default=80 or 90
maxclient    should be 80 or higher; default=80 or 90
numperm    real-time percent of non-computational memory (includes client below)
numclient    real-time percent of JFS2/NFS/vxfs filesystem buffer-cache
paging space page outs are triggered when numperm or numclient is less-than-or-equal-to minperm.  
Typically numperm and numclient is greater than minperm, and as such, no paging space page outs can be triggered.
paging space page outs are triggered when numperm or numclient is less-than-or-equal-to minperm.  Typically numperm and numclient is greater than minperm, and as such, no paging space page outs can be triggered.

No comments:

Post a Comment