Question
Using AIX Tools to Debug Network Problems
Answer
This document discusses some standard AIX commands that can check for network connectivity or performance problems.
From time to time users may be unable to access servers via their client
applications or they may experience performance problems. When
application and system checks do not indicate the problem, the system
administrator may need to check the network or the system's network
settings to find the problem. Using standard AIX tools, you can quickly
determine if a server is experiencing a network problem due to
configuration or network issues. These tools include thenetstat and tcpdump commands, which can help you isolate problems, from loss of connectivity to more complex network performance problems.
- Basic tools and the OSI-RM
- Using the netstat command
- Using the tcpdump command
Basic tools and the OSI-RM
The AIX commands you can use for a quick checkup include the lsdev, errpt, netstat and tcpdump commands.
With these tools, you can assess the lower layers of your system's
network configuration within the model known as the Open Systems
Interconnection (OSI) Reference Model (RM) (see Table 1). Using the
OSI-RM allows you to check common points of failure, without spending
too much time looking at elusive errors that might be caused by loss of
network access within an application.
Open Systems Interconnection Reference Model
Model Layer Function Assessment Tools 7. Application Layer Consists of application . programs that use the network. 6. Presentation Layer Standardizes data presentation to the applications. 5. Session Layer Manages sessions between applications. 4. Transport Layer Organizes data grams into netstat -s segments and reliably delivers iptrace them to upper layers. tcpdump 3. Network Layer Manages connections across the netstat -in, -rn, -s, -D network for the upper layers. topas iptrace tcpdump 2. Data Link Layer Provides reliable data delivery netstat -v, -D across the physical link. iptrace tcpdump 1. Physical Layer Defines the physical netstat -v, -D characteristics of the lsdev -C network media. errpt iptrace tcpdump
Using the netstat command
One of the netstat tools, the netstat -v command, can help you
decide if corrective action needs to be taken on the server or elsewhere
in the network. Output from this command is the same as the entstat, tokstat, fddistat, and atmstat commands combined. The netstat -v command
assesses the physical and data link layers of the OSI-RM. Thus, it is
one of the first commands you should use, after determining that there
is no hardware availability problem. (The errpt andlsdev -C commands can help determine availability.) The netstat -v output
can indicate whether you need to adjust configuration of a network
adapter (to reestablish or improve communications) or tune an adapter
for better data throughput.
Sample scenario
A simple scenario illustrates how the netstat -v command helps determine why a system is not communicating on its network.
The scenario assumes a system with the following characteristics:
- An IBM 4-Port 10/100 Mbps Ethernet PCI Adapter (ent0 - ent3)
- An onboard IBM 10/100 Mbps Ethernet PCI Adapter (ent4)
- A single cable connected to one of the ports on the four-port adapters
- A single IP address configured, on en0, which also maps to one of the logical devices (ent0) on the 4-Port card
The problem: Since TCP/IP was configured on en0, the system has been unable to ping any system on the network.
Example 1
- The lsdev -C and errpt commands were used to verify the availability of the adapter and interface.'
- The netstat -in command (interface configuration) and the netstat -rn (route configuration) command were used to check the IP configuration.
- After the first two preliminary steps, the next step is to use the netstat -v command to review specific statistics for adapter operations. Without a filter, thenetstat -v command produces at least 10 screens of data, so this examples uses the netstat -v ent0 command to limit the output as follows:
netstat -v ent0 | grep -p "Specific Statistics"
The RJ45 Port Link Status line in the sample output indicates whether or not the adapter has a link to the network. In this example, the RJ45 Port Link Status is down.IBM 4-Port 10/100 Base-TX Ethernet PCI Adapter Specific Statistics: ------------------------------------------------ Chip Version: 26 RJ45 Port Link Status : down Media Speed Selected: Auto negotiation Media Speed Running: 100 Mbps Full Duplex Receive Pool Buffer Size: 384 Free Receive Pool Buffers: 128 No Receive Pool Buffer Errors: 0 Inter Packet Gap: 96 Adapter Restarts due to IOCTL commands: 1
- Running netstat -v a second time without a filter allows you to check the port link status for every adapter. For example, enter:
netstat -v | more
and then use /Specific as the search string for the more command. In this example, such a search shows that ent3, not ent0, shows a port link status ofup. This information indicates that the cable is in the wrong port on the 4-Port Adapter, and that moving the cable to the correct (that is, configured) port fixes the problem.
Example 2
Interpreting the portion of the netstat -v output that indicates
adapter resource configuration can help isolate a system configuration
problem. When setting up servers that provide for network backup (such
as, TSM or SysBack), administrators commonly do some preliminary testing
and achieve good results. Then, as more remote servers are added to the
backup schedule, performance can decrease. Where network throughput was
once good, but then has decreased, netstat -v can uncover potential problems with adapter resources.
Many modern adapters have tunable buffers that allow you to adjust the
resources a device can obtain. When a backup server requires extensive
resources to handle data reception, looking at the output of netstat -v
for Receive Statistics and for Adapter Specific Statistics can help
isolate potential network performance bottlenecks. It is not uncommon to
see errors in the Adapter Specific section of the 10/100 Mbps adapter
that indicate "No Receive Pool Buffer Errors". In Example 2 the netstat -v command
is run twice, 30 seconds apart, while the server is handling several
backup jobs. The output shows the default setting of 384 on the receive
pool buffer needs to be adjusted higher. As long as no other errors
suggesting additional problems show up in the output, you can safely
assume that performance will improve when the receive pool buffer on
ent4 is adjusted.
- Run the following command to see specific statistics for en4:
netstat -v ent4 | grep -p "Specific Statistics"
Command output is similar to the following:IBM 4-Port 10/100 Base-TX Ethernet PCI Adapter Specific Statistics: ------------------------------------------------ Chip Version: 26 RJ45 Port Link Status : up Media Speed Selected: Auto negotiation Media Speed Running: 100 Mbps Full Duplex Receive Pool Buffer Size: 384 Free Receive Pool Buffers: 128 No Receive Pool Buffer Errors: 999875 Inter Packet Gap: 96 Adapter Restarts due to IOCTL commands: 1
- Run the following commands to check the No Receive Pool Buffer Errors after 30 seconds:
sleep 30 ; netstat -v ent4 | grep "Receive Pool Buffer Errors"
Output is similar to the following:No Receive Pool Buffer Errors: 1005761
Using the tcpdump command
The netstat tools (netstat -in, netstat -rn and netstat -v) cannot always determine the nature of a connection problem.
Example 3
Suppose your server has four separate network adapters configured and
attached to separate network segments. Two are working fine (VLAN A and
B) while no connections can be established to your server on the other
two segments (VLAN C and D). The output of netstat -v shows that
data is coming in on all four adapters and no errors are being logged,
indicating that the configuration at the physical and data link layers
is working. In such a case, you need to examine the inbound data itself.
You can use the tcpdump tool to examine the data online to help you determine the connection problem.
The tcpdump command provides much data, but for quick analysis only some basics pieces of its output (IP addresses) are needed:
You also want to consider the logical configuration you have set up for your interfaces (netstat -in). In this example, en2 was configured with address 9.3.6.225 and is in VLAN C (IP network 9.3.6.224, netmask 255.255.255.240); en3 was configured with address 9.3.6.243 and is in VLAN D (IP network 9.3.6.240, netmask 255.255.255.240).
- Run the following command to check traffic on en2:
tcpdump -i en2 -I -n
Output similar to the following is displayed:-TIME STAMP- -SOURCE IP- -DESTINATION IP- -FLAG -ADDITION INFO- 09:04:27.313527323 9.3.6.244.23 > 9.3.6.241.38160: P 7:9(2) ack 8 win 65535 09:04:27.402377282 9.3.6.245.45017 > 9.53.168.52.23: . ack 24 win 17520 (DF) [tos 0x10] 09:04:27.418818536 9.3.6.241.38160 > 9.3.6.244.23: . ack 9 win 65535 [tos 0x10 09:04:27.419054751 9.3.6.244.23 > 9.3.6.241.38160: P 9:49(40) ack 8 win 65535 09:04:27.524512144 9.3.6.245.45017 > 9.53.168.52.23: P 4:5(1) ack 24 win 17520 (DF) [tos 0x10] 09:04:27.526159054 9.53.168.52.23 > 9.3.6.245.45017: P 24:25(1) ack 5 win 2482 (DF) 09:04:27.602600775 9.3.6.245.45017 > 9.53.168.52.23: . ack 25 win 17520 (DF) [tos 0x10] 09:04:27.628488745 9.3.6.241.38160 > 9.3.6.244.23: . ack 49 win 65535 [tos 0x1
- Press Ctrl-C to stop the output display:
^C38 packets received by filter 0 packets dropped by kernel
Useful data can be gained from the tcpdump output simply by recognizing the source IP addresses in the traffice (shown in bold type in the sample output). Thus, the sample output shows that ent2 is
physically attached to the wrong network segment. The source IP
addressses should be in the 9.2.6.22x range, not the 9.3.6.24x range. It
is possible that swapping the cables for ent2 and ent3 may
solve the problem. If not, you may need to ask your network
administrator to reconfigure switch ports to pass the correct traffic.
With the information you gain from using the netstat -v and tcpdump tools, you can better decide which action is most appropriate.
AIX provides many tools for querying TCP/IP status on AIX servers. However, the netstat and tcpdump commands
do provide some methods for quick problem determination. For example,
these tools can help determine if you own the problem or if it needs to
be addressed by a network administrator.
For additional information, please refer to AIX Online Documents at the following URL:
Link
No comments:
Post a Comment