Part 1, Disk I/O overview and long-term monitoring tools (sar, nmon, and topas)
Introduction
A critical component of disk I/O tuning involves implementing best
practices prior to building your system. Because it is much more
difficult to move things around when you are already up and running, it
is extremely important that you do things right the first time when
planning your disk and I/O subsystem environment. This includes the
physical architecture, logical disk geometry, and logical volume and
file system configuration.
When a system administrator hears that there might be a disk
contention issue, the first thing he or she turns to is iostat. iostat,
the equivalent of using vmstat for your memory reports, is a quick and
dirty way of getting an overview of what is currently happening on your
I/O subsystem. While running iostat is not an inappropriate reaction at
all, the time to start thinking about disk I/O is long before tuning
becomes necessary. All the tuning in the world will not help if your
disks are not configured appropriately for your environment from the
beginning. Furthermore, it is extremely important to understand the
specifics of disk I/O and how it relates to AIX® and your System p™
hardware.
When it comes to disk I/O tuning, generic UNIX® commands and tools
help you much less than specific AIX tools and utilities that have been
developed to help you optimize your native AIX disk I/O subsystem. In
this article, we will define and discuss the AIX I/O stack and correlate
it to both the physical and logical aspects of disk performance. We
will discuss direct, concurrent, and asynchronous I/O: what they are,
how to turn them on, and how to monitor and tune them. We will also
introduce some of the long-term monitoring tools that you should use to
help tune your system. You might be surprised to hear that iostat is not
one of the tools recommended to help you with long-term gathering of
statistical data.
This article looks at the support and changes present in a beta
release of AIX 7, including the ways in which the configuration of the
different subsystems has changed. The main changes in AIX 7 further
simplify the operation and configuration of many of the I/O subsystems,
work that had originally been started in AIX 6. The result is that many
of the different I/O subsystems no longer need to be enabled and
configured. Instead, they are supplied in a pre-configured state and are
automatically enabled and started when an application requests that
functionality.
The article also concentrates on changes that will help identify
and improve the subsystem you are looking to tune. The best time to
start monitoring your systems is when you first put your system in
production, and it is running well (rather than waiting until your users
are screaming about slow performance). You really need to have a
baseline of what the system looked like when it was behaving normally to
analyze data when it is presumably not performing adequately. When
making changes to your I/O subsystem, make these changes one at a time
so that you will be able to assess fully the impact of your change. To
assess that impact, you'll be capturing data using one of the long-term
monitoring tools recommended in this article.
Disk I/O overview
It shouldn't surprise you that the slowest operation for running
any program is the time actually spent on retrieving the data from disk.
This all comes back to the physical component of I/O. The actual disk
arms must find the correct cylinder, the control needs to access the
correct blocks, and the disk heads have to wait while the blocks rotate
to them. The physical architecture of your I/O system should be
understood prior to any work on tuning activities for systems, since all
the tuning in the world won't help a poorly architected I/O subsystem
that consists of a slow disk or inefficient use of adapters.
Figure 1 illustrates how tightly integrated the physical I/O
components relate to the logical disk and its application I/O. This is
what is commonly referred to as the AIX I/O stack.
Figure 1. The AIX I/O stack
You need to be cognizant of all the layers when tuning, as each
impacts performance in a different way. When first setting up your
systems, start from the bottom (the physical layer) as you configure
your disk, the device layer, its logical volumes, file systems, and the
files and application. We can't emphasize enough the importance in
planning your physical storage environment. This involves determining
the amount of disk, type (speed), size, and throughput. One important
challenge with storage technology to note is that while storage
capabilities of disk are increasing dramatically, the rotational speed
of the disk increases at a much slower pace. You must never lose sight
of the fact that while RAM access takes about 540 CPU cycles, disk
access can take 20 million CPU cycles. Clearly, the weakest link on a
system is the disk I/O storage system, and it's your job as the system
administrator to make sure it doesn't become even more of a bottleneck.
As alluded to earlier, poor layout of data affects I/O performance much
more than any tunable I/O parameter. Looking at the I/O stack helps you
to understand this, as Logical Volume Manager (LVM) and disk placement
are closer to the bottom than the tuning parameters (ioo and vmo).
Now let's discuss some best practices of data layout. One
important concept is making sure that your data is evenly spread across
your entire physical disk. If your data resides on only a few spindles,
what is the purpose of having multiple logical unit numbers (LUNs) or
physical disks? If you have a SAN or another type of storage array, you
should try to create your arrays of equal size and type. You should also
create them with one LUN for each array and then spread all your
logical volumes across all the physical volumes in your Volume Group.
As stated previously, the time to do this is when you first
configure your system, as it is much more cumbersome to fix I/O problems
than memory or CPU problems, particularly if it involves moving data
around in a production environment. You also want to make certain that
your mirrors are on separate disks and adapters. Databases pose
separate, unique challenges; so, if possible, your indexes and redo logs
should also reside on separate physical disks. The same is true for
temporary tablespaces often used for performing sort operations.
Using high-speed adapters to connect the disk drives are extremely
important, but you must make certain that the bus itself does not
become a bottleneck. To prevent this from happening, make sure to spread
the adapters across multiple buses. At the same time, do not attach too
many physical disks or LUNs to any one adapter, as this also
significantly impacts performance. The more adapters that you configure,
the better, particularly if there are large amounts of heavily utilized
disk. You should also make sure that the device drivers support
multi-path I/O (MPIO), which allows for load balancing and availability
of your I/O subsystem.
Direct I/O
Let's return to some of the concepts mentioned earlier, such as
direct I/O. What is direct I/O? First introduced in AIX Version 4.3,
this method of I/O bypasses the Virtual Memory Manager (VMM) and
transfers data directly to disk from the user's buffer. Depending on
your type of application, it is possible to have improved performance
when implementing this technique. For example, files that have poor
cache utilization are great candidates for using direct I/O. Direct I/O
also benefits applications that use synchronous writes, as these writes
have to go to disk. CPU usage is reduced because the dual data copy
piece is eliminated. This copy occurs when the disk is copied to the
buffer cache and then again from the file. One of the major performance
costs of direct I/O is that while it can reduce CPU usage, it can also
result in processes taking longer to complete for smaller requests. Note
that this applies to persistent segments files that have a permanent
location on disk. When the file is not accessed through direct I/O with
the IBM Enhanced Journaled File System for AIX 5L™ (JFS2), the file is
cached as local pages and the data copied into RAM. Direct I/O, in many
ways, gives you the similar performance of using raw logical volumes,
while still keeping the benefits of having a JFS filesystem (for
example, ease of administration). When mounting a file system using
direct I/O, you should avoid large, file-enabled JFS filesystems.
Concurrent I/O
First introduced in AIX Version 5.2, this feature invokes direct
I/O, so it has all the other performance considerations associated with
direct I/O. With standard direct I/O, inodes (data structures associated
with a file) are locked to prevent a condition where multiple threads
might try to change the consults of a file simultaneously. Concurrent
I/O bypasses the inode lock, which allows multiple threads to read and
write data concurrently to the same file. This is due to the way JFS2 is
implemented with a write-exclusive inode lock, allowing multiple users
to read the same file simultaneously. As you can imagine, direct I/O can
cause major problems with databases that continuously read from the
same file. Concurrent I/O solves this problem, which is why it's known
as a feature that is used primarily for relational databases. Similar to
direct I/O, you can implement this either through an open system call
or by mounting the file system, as follows:
# mount -o cio /u
.
When you mount the file system with this command, all its files
use concurrent I/O. Even more so than using direct I/O, concurrent I/O
provides almost all the advantages of using raw logical volumes, while
still keeping the ease of administration available with file systems.
Note that you cannot use concurrent I/O with JFS (only JFS2). Further,
applications that might benefit from having a file system read ahead or
high buffer cache hit rates might actually see performance degradation.
Asynchronous I/O
What about asynchronous I/O? Synchronous and asynchronous I/O
refers to whether or not an application is waiting for the I/O to
complete to begin processing. Appropriate usage of asynchronous I/O can
significantly improve the performance of writes on the I/O subsystem.
The way it works is that it essentially allows an application to
continue processing while its I/O completes in the background. This
improves performance because it allows I/O and application processing to
run at the same time. Turning on asynchronous I/O really helps in
database environments. How can you monitor asynchronous I/O server
utilization? Both iostat and nmon can monitor asynchronous I/O server
utilization. Monitoring asynchronous I/O and changing the parameters is
only possible if you have executed an application that requires
asynchronous I/O. The AIX kernel enables the asynchronous I/O
components. This can lead to confusion when trying to alter parameters
as the ability to change them is unavailable until the module has been
loaded.
To determine whether asynchronous I/O has been enabled, you can check the output of the
ioo
command, as shown in Listing 1.
Listing 1. Checking the output of the
ioo
command# ioo -a aio_active = 0 aio_maxreqs = 65536 aio_maxservers = 30 aio_minservers = 3 aio_server_inactivity = 300 j2_atimeUpdateSymlink = 0 j2_dynamicBufferPreallocation = 16 j2_inodeCacheSize = 200 j2_maxPageReadAhead = 128 j2_maxRandomWrite = 0 j2_metadataCacheSize = 200 j2_minPageReadAhead = 2 j2_nPagesPerWriteBehindCluster = 32 j2_nRandomCluster = 0 j2_syncPageCount = 0 j2_syncPageLimit = 16 lvm_bufcnt = 9 maxpgahead = 8 maxrandwrt = 0 numclust = 1 numfsbufs = 196 pd_npages = 65536 posix_aio_active = 0 posix_aio_maxreqs = 65536 posix_aio_maxservers = 30 posix_aio_minservers = 3 posix_aio_server_inactivity = 300
You can see from this listing that both the
aio_active
and posix_aio_active
values
are set to zero. The other parameters are configurable and will become
enabled when the corresponding subsystem has been used.
The aio kernel processes are now available as
aioLpool
and aioPpool
(see Listing 2).
Listing 2. aio kernel processes are available as
aioPpool
and aioLpool
.l488pp065_pub[/] > pstat -a|grep aio 37 a 250068 1 250068 0 0 1 aioPpool 38 a 260052 1 260052 0 0 1 aioLpool
The result is that the aio system takes up less memory and process space. The tunable parameters, for example
aio_maxservers
,
are now configured per CPU tunable and specify the maximum number of
servers that can be created. Note that changing these values will not
change the immediate number of servers available, only the maximum
created by the kernel when there is existing outstanding I/O.
Additional parameters you may want to change are the maximum number of asynchronous I/O requests (
aio_maxreqs
) which alter the request queue size, and the aio_server_inactivity
which controls when asynchronous services are killed when no more requests exist.
To change the parameters, you can use either ioo or smit. You can
find the asynchronous parameters within Performance & Resource
Scheduling, Tuning Kernel & Network Parameters, and Tuning IO
Parameters. Within smit, you can get good idea of both the current and
the maximum possible values.
The iostat
-A
command reports back asynchronous I/O statistics if the kernel modules are loaded (see Listing 3).
Listing 3. iostat
-A
command# iostat -A System configuration: lcpu=2 drives=3 ent=0.60 paths=4 vdisks=4 aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc 0 0 32 0 4096 6.4 8.0 85.4 0.2 0.1 16.0 Disks: % tm_ act Kbps tps Kb_read Kb_wrtn hdisk0 0.5 2.0 0.5 0 4 hdisk1 1.0 5.9 1.5 8 4 hdisk2 0.0 0.0 0.0 0 0
What does this all mean?
- avgc: This reports back the average global asynchronous I/O request per second of the interval you specified.
- avfc: This reports back the average fastpath request count per second for your interval.
- maxgc: This reports back the max global asynchronous I/O request since the last time this value was fetched.
- maxfc: This reports back the maximum fastpath request count since the last time this value was fetched.
- maxreqs: This is the maximum asynchronous I/O requests allowed.
The major difference between
aio
and posixaio
is that the two involve different parameter passing, so you really need to configure both.
In AIX 7, as in AIX 6, the fsfastpath and fastpath tunables are no
longer modifiable. They are now classed as restricted tunables and are
set to 1 (enabled) by default. As such, they both enable asynchronous
I/O requests to be sent directly to underlying disk (instead of through
the corresponding subsystem and filesystem support), thus producing
better performance.
One last concept is I/O pacing. This is an AIX feature that
prevents disk I/O-intensive applications from flooding the CPU and
disks. Appropriate usage of disk I/O pacing helps prevent programs that
generate very large amounts of output from saturating the system's I/O
and causing system degradation. Tuning the maxpout and minpout helps
prevent threads performing sequential writes to files from dominating
system resources.
You can also limit the effect of setting global parameters by mounting file systems using an explicit 0 for minput and maxpout:
# mount -o minpout=0,maxpout=0 /u
.
Since AIX 6, the I/O pacing is enabled by default on the sys0 device, but you can also control the pacing on your other drives.
Note that you can also remount existing filesystems and set the
I/O pacing, which can be helpful if you want to alter the performance of
a disk that is already actively providing service.
Monitoring
AIX-specific tools (sar, topas, and nmon) are available to monitor
disk I/O activity. These tools allow you to troubleshoot quickly a
performance problem and capture data for historical trending and
analysis.
Don't expect to see iostat in this section, as iostat is a UNIX
utility that allows you to determine quickly if there is an imbalanced
I/O load between your physical disks and adapters. Unless you decide to
write your own scripting tools using iostat, it will not help you with
long-term trending and capturing data.
sar is one of those older generic UNIX tools that have been
improved over the years. While I generally prefer the use of more
specific AIX tools, such as topas or nmon, sar provides strong
information with respect to disk I/O. Let's run a typical
sar
command to examine I/O activity (see Listing 4).
Listing 4. Using
sar
# sar -d 1 2 AIX l488pp065_pub 1 7 00F604884C00 08/11/10 System configuration: lcpu=4 drives=1 ent=0.25 mode=Uncapped 11:38:44 device %busy avque r+w/s Kbs/s avwait avserv 11:38:45 hdisk0 1 0.0 6 24 0.0 1.9 11:38:46 hdisk0 0 0.0 3 15 0.0 2.3 Average hdisk0 0 0.0 4 19 0.0 2.1
Let's break down the column headings from Listing 4.
- %busy: This command reports back the portion of time that the device was busy servicing transfer requests.
- avque: In AIX Version 5.3, this command reports back the number of requests waiting to be sent to disk.
- r+w/s: This command reports back the number of read or write transfers to or from a device (512 byte units).
- avwait: This command reports the average wait time per request (milliseconds).
- avserv: This command reports the average service time per request (milliseconds).
You want to be wary of any disk that approaches 100 percent
utilization or a large amount of queue requests waiting for disk. While
there is some activity on the sar output, there really are no I/O
problems because there is no waiting for I/O. You need to continue to
monitor the system to make sure that other disks are also being used
besides hdisk0. Where sar is different than iostat is that it has the
ability to capture data for long-term analysis and trending through its
system activity data collector (sadc) utility. Usually turned off in
cron, this utility allows you to capture data for historic trending and
analysis.
Here's how this works. As delivered on AIX systems by default,
there are two shell scripts that are normally commented out
(/usr/lib/sa/sa1 and /usr/lib/sa/sa2) that provide daily reports on the
activity of the system. The
sar
command actually calls the sadc routine to access system data (see Listing 5).
Listing 5. Example cronjob
# crontab -l | grep sa1 0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 & 0 * * * 0,6 /usr/lib/sa/sa1 & 0 18-7 * * 1-5 /usr/lib/sa/sa1 &
What about something a little more user-friendly? Did you say
topas? topas is a nice performance monitoring tool that you can use for a
number of purposes, including, but not limited to, your disk I/O
subsystem (see Figure 2).
Figure 2. topas
Take a look at the topas output from a disk perspective. There is
no I/O activity going on here at all. Besides the physical disk, pay
close attention to "Wait" (in the CPU section up top), which also helps
determine if the system is I/O bound. If you see high numbers here, you
can then use other tools (such as filemon, fileplace, lsof, or lslv) to
help you figure out which processes, adapters, or file systems are
causing your bottlenecks. topas is good for quickly troubleshooting an
issue when you want a little more than iostat. In a sense, topas is a
graphical mix of iostat and vmstat, though with recent improvements, it
now allows the ability to capture data for historical analysis.
Also useful is the topas physical hard disk output (
-D
).
It shows disk statistics and can show you if a single hardware disk is
being hammered and would benefit from having filesystems or information
spread and moved over other disks. You can see a sample of the output
in Figure 3.
Figure 3. Sample output for disk statistics
In particular, you should check the ART/AWT and MRT/MWT which show
the average and maximum wait times for reads and writes to the disk.
High values indicate a very busy disk. The AQW shows the average number
of queues waiting per request to the I/O device. Again, high values may
indicate a disk that is unable to keep up with the demands being
requested of it.
This is nmon (my favorite AIX performance tool). While nmon
provides a front-end similar to topas, it is much more useful in terms
of long-term trending and analyses. Further, it gives the system
administrator the ability to output data to an Excel spreadsheet that
comes back in charts (tailor-made for senior management and functional
teams) that clearly illustrate your bottlenecks. This is done through a
tool called nmon analyzer, which provides the hooks into nmon. With
respect to disk I/O, nmon reports back the following data: disk I/O
rates, data transfers, read/write ratios, and disk adapter statistics.
Here is one small example of where nmon really shines. Say you
want to know which processes are taking most of the disk I/O and you
want to be able to correlate it with the actual disk to clearly
illustrate I/O per process. nmon usage helps you more than any other
tool. To do this with nmon, use the
-t
option; set your timing and then sort by I/O channel.
How do you use nmon to capture data and import it into the analyzer? Use the
sudo
command and run nmon for three hours, taking a snapshot every 30 seconds: # sudo nmon -f -t -r test1 -s 30 -c 180
. Then sort the output file that gets created: # sort -A testsystem_yymmdd.nmon > testsystem_yymmdd.csv
.
When this is completed, ftp the .csv file to your PC, start the nmon analyzer spreadsheet (enable macros), and click on analyze nmon data. You can download the nmon analyzer from here.
Figure 4 provides a disk summary for each disk in kilobytes per second for reads and writes.
Figure 4. Disk summary for each disk in kilobytes per second for reads and writes
Conclusion
This article addressed the relative importance of the disk I/O
subsystem. It defined and discussed the AIX I/O stack and how it related
to both physical and logical disk I/O. It also covered some best
practices for disk configuration in a database environment, looked at
the differences between direct and concurrent I/O, and also discussed
asynchronous I/O and I/O pacing. You tuned your asynchronous I/O servers
and configured I/O pacing. You started up file systems in concurrent
I/O mode and studied when to best implement concurrent I/O. Further, you
learned all about iostat and captured data using sar, topas, and nmon.
You also examined different types of output and defined many of the
flags used in sar and iostat. Part 2 of this series will drill down to
the logical volume manager layer of the AIX I/O stack and looks at some
of the snapshot-type tools, which help you quickly access the state of
your disk I/O subsystem. Part 3 will focus primarily on tracing I/O
usage using tools, such as filemon and fileplace, and how to improve
file system performance overall.
No comments:
Post a Comment