Optimizing AIX 7 performance: Part 3, Tune with ioo, filemon, fileplace, JFS and JFS2
Summary: Part 3 of the AIX 7 performance series covers how to improve overall file system performance, how to tune your systems with theioo
command, and how to use the filemon and fileplace utilities. You will also learn about JFS and JFS2 that is available in AIX7.About this series
This three-part series (see Resources) on the AIX® disk and I/O
subsystem focuses on the challenges of optimizing disk I/O performance.
While disk tuning is arguably less exciting than CPU or memory tuning,
it is a crucial component in optimizing server performance. In fact,
partly because disk I/O is your weakest subsystem link, there is more
you can do to improve disk I/O performance than on any other subsystem.
Introduction
The first and second installments of this series discussed the
importance of architecting your systems, the impact it can have on
overall system performance, and a new I/O tuning tool, lvmo, which you
can use to tune logical volumes. In this installment, you will examine
how to tune your systems using the
ioo
command,
which configures the majority of all I/O tuning parameters and displays
the current or next boot values for all I/O tuning parameters. You will
also learn how and when to use the filemon and fileplace tools. With
enhanced journaled file system, the default file system within AIX,
improving your overall file system performance, tuning your file
systems, and getting the best out of the JFS2 are all important parts of
your tuning toolkit. You'll even examine some file system attributes,
such as sequential and random access, which can affect performance.File system overview
This section discusses JFS2, file system performance, and specific
performance improvements over JFS. As you know, there are two types of
kernels in AIX. There is a 32-bit kernel and a 64-bit kernel. While they
both share some common libraries and most commands and utilities, it is
important to understand their differences and how the kernels relate to
overall performance tuning. JFS2 has been optimized for the 64-bit
kernel, while JFS is optimized for the 32-bit kernel. Journaling file
systems, while much more secure, historically have been associated with
performance overheads. In a Performance Rules shop (at the expense of
availability), you would disable metadata logging to increase
performance with JFS. With JFS2, you can also disable logging (in AIX
6.1 and higher) to help increase performance. You can disable logging at
the point of mounting the filesystem, which means that you don't need
to worry about changing or reconfiguring the filesystem. You can instead
just modify your mount options. For example, to disable logging on
filesystem you would use the following:
mount -i log=NULL /database
.
Although JFS2 was optimized to improve the performance of metadata
operations, that is, those normally handled by the logging framework,
switching logging off can have a significant performance benefit for
filesystems where there is a high proportion of file changes and newly
created/deleted files. For example, filesystems on development
filesystems may see an increase in performance. For databases where the
files used are static, the performance improvement may be less
significant.
However, you should be careful making use of compression. Although
compression can save disk space (and disk reads and writes, since less
data is physically read from or written to the disk), the overhead on
systems with a heavy CPU loads can actually slow performance down.
Enhanced JFS2 uses a binary tree representation while performing inode
searches, which is a much better method than the linear method used by
JFS. Furthermore, you do not need to assign inodes anymore when creating
file systems, as they are now dynamically allocated by JFS2 (meaning
you won't be running out of them).
While concurrent I/O was covered in the first installment of the series,
it's worth another mention here. Implementation of concurrent I/O
allows multiple threads to read and write data concurrently to the same
file. This is due to the way in which JFS2 is implemented with
write-exclusive inode locks. This allows multiple users to read the same
file simultaneously, which increases performance dramatically when
multiple users read from the same data file. To turn concurrent I/O on,
you just need to mount the f/s with the appropriate flags (see Listing
1). We recommend that you look at using concurrent I/O when using
databases such as Oracle.
Listing 1. Turning on concurrent I/O
root@lpar29p682e_pub[/] mount -o cio /test root@lpar29p682e_pub[/] > df -k /test Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/fslv00 131072 130724 1% 4 1% /test
Table 1 illustrates the various enhancements of JFS2 and how they relate
to systems performance. It's also important to understand that when
tuning your I/O systems, many of the tunables themselves (you'll get
into that later) differ, depending on whether you are using JFS or JFS2.
Table 1. Enhancements of JFS2
Function | JFS | JFS2 |
---|---|---|
Compression | Yes | No |
Quotas | Yes | Yes |
Deferred update | Yes | No |
Direct I/O support | Yes | Yes |
Optimization | 32-bit | 64-bit |
Max file system size | 1 terabyte | 4 petabytes |
Max file size | 64 gigabyes | 4 petabytes |
Number of inodes | Fixed when creating f/s | Dynamic |
Large file support | As mount option | Default |
On-line defragmentation | Yes | Yes |
Namefs | Yes | Yes |
DMAPI | No | Yes |
Filemon and fileplace
This section introduces two important I/O tools, filemon and fileplace,
and discusses how you can use them during systems administration each
day.
Filemon uses a trace facility to report on the I/O activity of physical
and logical storage, including your actual files. The I/O activity
monitored is based on the time interval that is specified when running
the trace. It reports on all layers of file system utilization,
including the Logical Volume Manager (LVM), virtual memory, and physical
disk layers. Without any flags, it runs in the background while
application programs or system commands are being run and monitored. The
trace starts automatically until it is stopped. At that time, the
command generates an I/O activity report and exits. It can also process a
trace file that has been recorded by the trace facility. Reports can
then be generated from this file. Because reports generated to standard
output usually scroll past your screen, it's recommended that you use
the
-o
option to write the output to a file (see Listing 2).
Listing 2. Using filemon with the
-o
optionl488pp065_pub[/] > filemon -o dbmon.out -O all Run trcstop command to signal end of trace. Thu Aug 12 09:07:06 2010 System: AIX 7.1 Node: l488pp065_pub Machine: 00F604884C00 l488pp065_pub[/] > trcstop l488pp065_pub[/] > cat dbmon.out Thu Aug 12 09:10:09 2010 System: AIX 7.1 Node: l488pp065_pub Machine: 00F604884C00 Cpu utilization: 72.8% Cpu allocation: 100.0% 21947755 events were lost. Reported data may have inconsistencies or errors. Most Active Files ------------------------------------------------------------------------ #MBs #opns #rds #wrs file volume:inode ------------------------------------------------------------------------ 0.4 1 101 0 unix /dev/hd2:82241 0.0 9 10 0 vfs /dev/hd4:9641 0.0 4 6 1 db.sql 0.0 3 6 2 ksh.cat /dev/hd2:111192 0.0 1 2 0 cmdtrace.cat /dev/hd2:110757 0.0 45 1 0 null 0.0 1 1 0 dd.cat /dev/hd2:110827 0.0 9 2 0 SWservAt /dev/hd4:9156 0.0 1 0 3 db2.sql 0.0 9 2 0 SWservAt.vc /dev/hd4:9157 Most Active Segments ------------------------------------------------------------------------ #MBs #rpgs #wpgs segid segtype volume:inode ------------------------------------------------------------------------ 0.1 2 13 8359ba client Most Active Logical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.04 0 32 0.3 /dev/hd9var /var 0.00 0 48 0.5 /dev/hd8 jfs2log 0.00 0 8 0.1 /dev/hd4 / Most Active Physical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.00 0 72 0.7 /dev/hdisk0 N/A Most Active Files Process-Wise ------------------------------------------------------------------------ #MBs #opns #rds #wrs file PID(Process:TID) ------------------------------------------------------------------------ 0.0 3 6 0 db.sql 7667828(ksh:9437345) 0.0 1 2 0 ksh.cat 7667828(ksh:9437345) 0.0 1 0 3 db2.sql 7667828(ksh:9437345) 0.0 1 0 1 db.sql 7733344(ksh:7405633) 0.4 1 101 0 unix 7667830(ksh:9437347) 0.0 1 2 0 cmdtrace.cat 7667830(ksh:9437347) 0.0 1 2 0 ksh.cat 7667830(ksh:9437347) 0.0 9 2 0 SWservAt 7667830(ksh:9437347) 0.0 9 2 0 SWservAt.vc 7667830(ksh:9437347) 0.0 1 0 0 systrctl 7667830(ksh:9437347) 0.0 44 0 44 null 4325546(slp_srvreg:8585241) 0.0 1 2 2 ksh.cat 7667826(ksh:23527615) 0.0 1 1 0 dd.cat 7667826(ksh:23527615) 0.0 1 1 0 null 7667826(ksh:23527615) 0.0 1 0 0 test 7667826(ksh:23527615) 0.0 8 8 0 vfs 3473482(topasrec:13566119) 0.0 1 0 0 CuAt.vc 3473482(topasrec:13566119) 0.0 1 0 0 CuAt 3473482(topasrec:13566119) 0.0 1 2 0 vfs 2097252(syncd:2490503) 0.0 1 0 0 installable 4260046(java:15073489) Most Active Files Thread-Wise ------------------------------------------------------------------------ #MBs #opns #rds #wrs file TID(Process:PID) ------------------------------------------------------------------------ 0.0 3 6 0 db.sql 9437345(ksh:7667828) 0.0 1 2 0 ksh.cat 9437345(ksh:7667828) 0.0 1 0 3 db2.sql 9437345(ksh:7667828) 0.0 1 0 1 db.sql 7405633(ksh:7733344) 0.4 1 101 0 unix 9437347(ksh:7667830) 0.0 1 2 0 cmdtrace.cat 9437347(ksh:7667830) 0.0 1 2 0 ksh.cat 9437347(ksh:7667830) 0.0 9 2 0 SWservAt 9437347(ksh:7667830) 0.0 9 2 0 SWservAt.vc 9437347(ksh:7667830) 0.0 1 0 0 systrctl 9437347(ksh:7667830) 0.0 44 0 44 null 8585241(slp_srvreg:4325546) 0.0 1 2 2 ksh.cat 23527615(ksh:7667826) 0.0 1 1 0 dd.cat 23527615(ksh:7667826) 0.0 1 1 0 null 23527615(ksh:7667826) 0.0 1 0 0 test 23527615(ksh:7667826) 0.0 8 8 0 vfs 13566119(topasrec:3473482) 0.0 1 0 0 CuAt.vc 13566119(topasrec:3473482) 0.0 1 0 0 CuAt 13566119(topasrec:3473482) 0.0 1 2 0 vfs 2490503(syncd:2097252) 0.0 1 0 0 installable 15073489(java:4260046) dbmon.out: END
Look for long seek times, as they can result in decreased application
performance. By looking at the read and write sequence counts in detail,
you can further determine if the access is sequential or random. This
helps you when it is time to do your I/O tuning. This output clearly
illustrates that there is no I/O bottleneck visible. Filemon provides a
tremendous amount of information and, truthfully, we've found there is
too much information at times. Further, there can be a performance hit
using filemon, depending on how much general file activity there is
while filemon is running. Let's look at the topas results while running
filemon (see Figure 1).
Figure 1. topas results while running filemon
In the figure above, filemon is taking up almost 60 percent of the CPU!
This is actually less than in previous AIX versions but still a
significant impact on your overall system performance. We don't
typically like to recommend performance tools that have such a
substantial overhead, so we'll reiterate that while filemon certainly
has a purpose, you need to be very careful when using it.
What about fileplace? Fileplace reports the placement of a file's blocks
within a file system. It is commonly used to examine and assess the
efficiency of a file's placement on disk. For what purposes do you use
it? One reason would be to help determine if some of the heavily
utilized files are substantially fragmented. It can also help you
determine the physical volume with the highest utilization and whether
or not the drive or I/O adapter is causing the bottleneck.
Let's look at an example of a frequently accessed file in Listing 3.
Listing 3. Frequently accessed file
fileplace -pv /tmp/logfile File: /tmp/logfile Size: 63801540 bytes Vol: /dev/hd3 Blk Size: 4096 Frag Size: 4096 Nfrags: 15604 Inode: 7 Mode: -rw-rw-rw- Owner: root Group: system Physical Addresses (mirror copy 1) Logical Extent ---------------------------------- ---------------- 02884352-02884511 hdisk0 160 frags 655360 Bytes, 1.0% 00000224-00000383 02884544-02899987 hdisk0 15444 frags 63258624 Bytes, 99.0% 00000416-00015859 unallocated -27 frags -110592 Bytes 0.0% 15604 frags over space of 15636 frags: space efficiency = 99.8% 2 extents out of 15604 possible: sequentiality = 100.0%
You should be interested in space efficiency and sequentiality here.
Higher space efficiency means files are less fragmented and provide
better sequential file access. A higher sequentiality tells you that the
files are more contiguously allocated, which will also be better for
sequential file access. In the case here, space efficiency could be
better while sequentiality is quite high. If the space and sequentiality
are too low, you might want to consider file system reorganization. You
would do this with the
reorgvg
command, which can improve logical volume utilization and efficiency. You may also want to consider using thedegrafs
command
which can help ensure that the free space on your filesystem is
contiguous, which will help with future writes and file creates.
Defragmentation can occur in the background while you are using your
system.Tuning with ioo
This section discusses the use of the
ioo
command, which is used for virtually all I/O-related tuning parameters.
Like
vmo
, you need to be extremely careful when changingioo
parameters,
as changing parameters on the fly can cause severe performance
degradation. Table 2 details specific tuning parameters that you use
often for JFS file systems. As you can see, the majority of the tuning
commands for I/O utilize theioo
utility.
Table 2. Specific tuning parameters
Function | JFS tuning parameter | Enhanced JFS tuning parameter |
---|---|---|
Sets max amount of memory for caching files | vmo -o maxperm=value | vmo -o maxclient=value (< or = maxperm) |
Sets min amount of memory for caching | vmo -o minperm=value | n/a |
Sets a limit (hard) on memory for caching | vmo -o strict_maxperm | vmo -o maxclient (hard limit) |
Sets max pages used for sequential read ahead | ioo -o maxpgahead=value | ioo -o j2_maxPageReadAhead=value |
Sets min pages used for sequential read ahead | ioo -o minpgahead | ioo -o j2_minPageReadAhead=value |
Sets max number of pending write I/O to a file | chhdev -l sys0 -a maxpout maxpout | chdev -l sys0 -a maxpout maxpout |
Sets min number of pending write I/Os to a file at which programs blocked by maxpout might proceed | chdev -l sys0 -a minpout minpout | chdev -l sys0 -a minpout minpout |
Sets the amount of modified data cache for a file with random writes | ioo -o maxrandwrt=value | ioo -o j2_maxRandomWrite ioo -o j2_nRandomCluster |
Controls gathering of I/Os for sequential write behind | ioo -o numclust=value | ioo -o j2_nPagesPerWriteBehindCluster=value |
Sets the number of f/s bufstructs | ioo -o numfsbufs=value | ioo -o j2_nBufferPerPagerDevice=value |
Let's further discuss some of the more important parameters below, as we've already discussed all the
vmo
tuning parameters in the memory tuning series (see Resources).
There are several ways you can determine the existing
ioo
values on your system. The long display listing forioo
clearly
gives you the most information (see Listing 4). It lists the values for
current, reboot value, range, unit, type, and dependencies of all
tunables parameters managed byioo
.
Listing 4. Display for ioo
root@lpar29p682e_pub[/] > ioo -L NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES j2_atimeUpdateSymlink 0 0 0 0 1 boolean D j2_dynamicBufferPreallo 16 16 16 0 256 16K slabs D j2_inodeCacheSize 400 400 400 1 1000 D j2_maxPageReadAhead 128 128 128 0 64K 4KB pages D j2_maxRandomWrite 0 0 0 0 64K 4KB pages D j2_maxUsableMaxTransfer 512 512 512 1 4K pages M j2_metadataCacheSize 400 400 400 1 1000 D j2_minPageReadAhead 2 2 2 0 64K 4KB pages D j2_nBufferPerPagerDevice 512 512 512 512 256K M j2_nPagesPerWriteBehindC 32 32 32 0 64K D j2_nRandomCluster 0 0 0 0 64K 16KB clusters D j2_nonFatalCrashesSystem 0 0 0 0 1 boolean D j2_syncModifiedMapped 1 1 1 0 1 boolean D j2_syncdLogSyncInterval 1 1 1 0 4K iterations D jfs_clread_enabled 0 0 0 0 1 boolean D jfs_use_read_lock 1 1 1 0 1 boolean D lvm_bufcnt 9 9 9 1 64 128KB/buffer D maxpgahead minpgahead 8 8 8 0 4K 4KB pages D maxrandwrt 0 0 0 0 512K 4KB pages D memory_frames 512K 512K 4KB pages S Minpgahead maxpgahead 2 2 2 0 4K 4KB pages D numclust 1 1 1 0 2G-1 16KB/cluster D numfsbufs 196 196 196 1 2G-1 M pd_npages 64K 64K 64K 1 512K 4KB pages D pgahd_scale_thresh 0 0 0 0 419430 4KB pages D pv_min_pbuf 512 512 512 512 2G-1 D sync_release_ilock 0 0 0 0 1 boolean D n/a means parameter not supported by the current platform or kernel Parameter types: S = Static: cannot be changed D = Dynamic: can be freely changed B = Bosboot: can only be changed using bosboot and reboot R = Reboot: can only be changed during reboot C = Connect: changes are only effective for future socket connections M = Mount: changes are only effective for future mountings I = Incremental: can only be incremented d = deprecated: deprecated and cannot be changed
Listing 5 below shows you how to change a tunable.
Listing 5. Changing a tunable
root@lpar29p682e_pub[/] > ioo -o maxpgahead=32 Setting maxpgahead to 32 root@lpar29p682e_pub[/] >
This parameter is used for JFS only. For JSF2, there are additional file
system performance enhancements including sequential page read ahead
and sequential and random write behind. The Virtual Memory Manager (VMM)
of AIX anticipates page requirements for observing the patterns of
files that are accessed. When a program accesses two pages of a file,
VMM assumes that the program keeps trying to access the file in a
sequential method. The number of pages to be read ahead can be
configured using VMM thresholds. With JFS2, make note of these two
important parameters:
J2_minPageReadAhead
: Determines the number of pages ahead when VMM initially detects a sequential pattern.J2_maxPageReadAhead
: Determines the maximum amount of pages that VMM can read in a sequential file.
Sequential and random write behind relates to writing modified pages in
memory to disk after a certain threshold is reached. In this way, it
does not wait for
syncd
to
flush out pages to disk. The reason for this is to limit the amount of
dirty pages in memory, which further reduces I/O overhead and disk
fragmentation. The two types of write behind are sequential and random.
With sequential write behind, pages do not stay in memory until thesyncd
daemon
runs, which can cause real bottlenecks. With random write behind, the
number of pages in memory exceeds a specified amount and all subsequent
pages are written to disk.
For the sequential write behind, you should specify the number of pages to be scheduled to be written; the
j2_nPagesPerWriterBehindCluster
parameter
specifies this parameter. By default the value is 32 (that is, 128KB),
for modern disks and high write environments, such as databases, you may
want to increase this parameter so that more data is written in a
single block when the data needs to be synced to disk.
The random write behind can be configured by changing the values of
j2_nRandomCluster
andj2_maxRandomWrite
. Thej2_maxRandomWrite
parameter
specifies the number of pages of a file that are allowed to stay in
memory. The default is 0 (meaning that information is written out as
quickly as possible), and this is used to ensure data integrity. If you
are willing to sacrifice some integrity in the event of a system
failure, for better write performance you can increase these values.
This keeps them in cache, so a system failure may not have written the
data to disk properly. Thej2_nRandomCluster
defines
the number of clusters apart two writes must be to be considered
random. Increasing this value can lower the write frequency if you have a
high number of files being modified at the same time.
Another important area worth mentioning is large sequential I/O
processing. When there is too much simultaneous I/O to your file
systems, the I/O can bottleneck at the f/s level. In this case, you
should increase the
j2_nBufferPerPagerDevice
parameter
(numfsbus with JFS). If you use raw I/O as opposed to file systems,
this same type of bottleneck can occur through LVM. Here is where you
might want to tune thelvm_bufcnt
parameter.
Summary
This article focused on file system performance. You examined the
enhancements in JFS2 and why it would be the preferred file system.
Further, you used tools, such as filemon and fileplace, to gather more
detailed information about the actual file structures and how they
relate to I/O performance. Finally, you tuned your I/O subsystem by
using the
ioo
command. You learned about theJ2_minPageReadAhead
andJ2_maxPageReadAhead
parameters in an effort to increase performance when encountering sequential I/O.
During this three-part series on I/O you learned that, perhaps more so
than any other subsystem, your tuning must start prior to stress testing
your systems. Architecting the systems properly can do more to increase
performance than anything you can do with tuning I/O parameters. This
includes strategic disk placement and making sure you have enough
adapters to handle the throughput of your disks. Further, while this
series focused on I/O, understand that the VMM is also very tightly
linked with I/O performance and must also be tuned to receive optimum
I/O performance.
No comments:
Post a Comment