Question
Using AIX Tools to Debug Network Problems
Answer
This document discusses some standard AIX commands that can check for network connectivity or performance problems.
From time to time users may be unable to access servers via their client
 applications or they may experience performance problems. When 
application and system checks do not indicate the problem, the system 
administrator may need to check the network or the system's network 
settings to find the problem. Using standard AIX tools, you can quickly 
determine if a server is experiencing a network problem due to 
configuration or network issues. These tools include thenetstat and tcpdump commands, which can help you isolate problems, from loss of connectivity to more complex network performance problems.
- Basic tools and the OSI-RM
 - Using the netstat command
 - Using the tcpdump command
 
Basic tools and the OSI-RM
The AIX commands you can use for a quick checkup include the lsdev, errpt, netstat and tcpdump commands.
 With these tools, you can assess the lower layers of your system's 
network configuration within the model known as the Open Systems 
Interconnection (OSI) Reference Model (RM) (see Table 1). Using the 
OSI-RM allows you to check common points of failure, without spending 
too much time looking at elusive errors that might be caused by loss of 
network access within an application.
Open Systems Interconnection Reference Model
 Model Layer           Function                         Assessment Tools
             
7. Application Layer  Consists of application          . 
                      programs that use the network.
6. Presentation Layer Standardizes data presentation 
                      to the applications.
5. Session Layer      Manages sessions between 
                      applications.
4. Transport Layer    Organizes data grams into        netstat -s 
                      segments and reliably delivers   iptrace 
                      them to upper layers.            tcpdump
3. Network Layer      Manages connections across the   netstat -in, -rn, -s, -D
                      network for the upper layers.    topas
                                                       iptrace
                                                       tcpdump
2. Data Link Layer    Provides reliable data delivery  netstat -v, -D
                      across the physical link.        iptrace
                                                       tcpdump
1.  Physical Layer    Defines the physical             netstat -v, -D 
                      characteristics of the           lsdev -C
                      network media.                   errpt
                                                       iptrace
                                                       tcpdump
Using the netstat command
One of the netstat tools, the netstat -v command, can help you 
decide if corrective action needs to be taken on the server or elsewhere
 in the network. Output from this command is the same as the entstat, tokstat, fddistat, and atmstat commands combined. The netstat -v command
 assesses the physical and data link layers of the OSI-RM. Thus, it is 
one of the first commands you should use, after determining that there 
is no hardware availability problem. (The errpt andlsdev -C commands can help determine availability.) The netstat -v output
 can indicate whether you need to adjust configuration of a network 
adapter (to reestablish or improve communications) or tune an adapter 
for better data throughput.
Sample scenario
A simple scenario illustrates how the netstat -v command helps determine why a system is not communicating on its network.
The scenario assumes a system with the following characteristics:
- An IBM 4-Port 10/100 Mbps Ethernet PCI Adapter (ent0 - ent3)
 - An onboard IBM 10/100 Mbps Ethernet PCI Adapter (ent4)
 - A single cable connected to one of the ports on the four-port adapters
 - A single IP address configured, on en0, which also maps to one of the logical devices (ent0) on the 4-Port card
 
The problem: Since TCP/IP was configured on en0, the system has been unable to ping any system on the network.
Example 1
- The lsdev -C and errpt commands were used to verify the availability of the adapter and interface.'
 - The netstat -in command (interface configuration) and the netstat -rn (route configuration) command were used to check the IP configuration.
 - After the first two preliminary steps, the next step is to use the netstat -v command to review specific statistics for adapter operations. Without a filter, thenetstat -v command produces at least 10 screens of data, so this examples uses the netstat -v ent0 command to limit the output as follows:
netstat -v ent0 | grep -p "Specific Statistics"
The RJ45 Port Link Status line in the sample output indicates whether or not the adapter has a link to the network. In this example, the RJ45 Port Link Status is down.IBM 4-Port 10/100 Base-TX Ethernet PCI Adapter Specific Statistics: ------------------------------------------------ Chip Version: 26 RJ45 Port Link Status : down Media Speed Selected: Auto negotiation Media Speed Running: 100 Mbps Full Duplex Receive Pool Buffer Size: 384 Free Receive Pool Buffers: 128 No Receive Pool Buffer Errors: 0 Inter Packet Gap: 96 Adapter Restarts due to IOCTL commands: 1
 - Running netstat -v a second time without a filter allows you to check the port link status for every adapter. For example, enter:
netstat -v | more
and then use /Specific as the search string for the more command. In this example, such a search shows that ent3, not ent0, shows a port link status ofup. This information indicates that the cable is in the wrong port on the 4-Port Adapter, and that moving the cable to the correct (that is, configured) port fixes the problem. 
Example 2
Interpreting the portion of the netstat -v output that indicates 
adapter resource configuration can help isolate a system configuration 
problem. When setting up servers that provide for network backup (such 
as, TSM or SysBack), administrators commonly do some preliminary testing
 and achieve good results. Then, as more remote servers are added to the
 backup schedule, performance can decrease. Where network throughput was
 once good, but then has decreased, netstat -v can uncover potential problems with adapter resources.
Many modern adapters have tunable buffers that allow you to adjust the 
resources a device can obtain. When a backup server requires extensive 
resources to handle data reception, looking at the output of netstat -v 
for Receive Statistics and for Adapter Specific Statistics can help 
isolate potential network performance bottlenecks. It is not uncommon to
 see errors in the Adapter Specific section of the 10/100 Mbps adapter 
that indicate "No Receive Pool Buffer Errors". In Example 2 the netstat -v command
 is run twice, 30 seconds apart, while the server is handling several 
backup jobs. The output shows the default setting of 384 on the receive 
pool buffer needs to be adjusted higher. As long as no other errors 
suggesting additional problems show up in the output, you can safely 
assume that performance will improve when the receive pool buffer on 
ent4 is adjusted.
- Run the following command to see specific statistics for en4:
netstat -v ent4 | grep -p "Specific Statistics"
Command output is similar to the following:IBM 4-Port 10/100 Base-TX Ethernet PCI Adapter Specific Statistics: ------------------------------------------------ Chip Version: 26 RJ45 Port Link Status : up Media Speed Selected: Auto negotiation Media Speed Running: 100 Mbps Full Duplex Receive Pool Buffer Size: 384 Free Receive Pool Buffers: 128 No Receive Pool Buffer Errors: 999875 Inter Packet Gap: 96 Adapter Restarts due to IOCTL commands: 1
 - Run the following commands to check the No Receive Pool Buffer Errors after 30 seconds:
sleep 30 ; netstat -v ent4 | grep "Receive Pool Buffer Errors"
Output is similar to the following:No Receive Pool Buffer Errors: 1005761
 
Using the tcpdump command
The netstat tools (netstat -in, netstat -rn and netstat -v) cannot always determine the nature of a connection problem.
Example 3
Suppose your server has four separate network adapters configured and 
attached to separate network segments. Two are working fine (VLAN A and 
B) while no connections can be established to your server on the other 
two segments (VLAN C and D). The output of netstat -v shows that 
data is coming in on all four adapters and no errors are being logged, 
indicating that the configuration at the physical and data link layers 
is working. In such a case, you need to examine the inbound data itself.
 You can use the tcpdump tool to examine the data online to help you determine the connection problem.
The tcpdump command provides much data, but for quick analysis only some basics pieces of its output (IP addresses) are needed:
You also want to consider the logical configuration you have set up for your interfaces (netstat -in). In this example, en2 was configured with address 9.3.6.225 and is in VLAN C (IP network 9.3.6.224, netmask 255.255.255.240); en3 was configured with address 9.3.6.243 and is in VLAN D (IP network 9.3.6.240, netmask 255.255.255.240).
- Run the following command to check traffic on en2:
tcpdump -i en2 -I -n
Output similar to the following is displayed:-TIME STAMP- -SOURCE IP- -DESTINATION IP- -FLAG -ADDITION INFO- 09:04:27.313527323 9.3.6.244.23 > 9.3.6.241.38160: P 7:9(2) ack 8 win 65535 09:04:27.402377282 9.3.6.245.45017 > 9.53.168.52.23: . ack 24 win 17520 (DF) [tos 0x10] 09:04:27.418818536 9.3.6.241.38160 > 9.3.6.244.23: . ack 9 win 65535 [tos 0x10 09:04:27.419054751 9.3.6.244.23 > 9.3.6.241.38160: P 9:49(40) ack 8 win 65535 09:04:27.524512144 9.3.6.245.45017 > 9.53.168.52.23: P 4:5(1) ack 24 win 17520 (DF) [tos 0x10] 09:04:27.526159054 9.53.168.52.23 > 9.3.6.245.45017: P 24:25(1) ack 5 win 2482 (DF) 09:04:27.602600775 9.3.6.245.45017 > 9.53.168.52.23: . ack 25 win 17520 (DF) [tos 0x10] 09:04:27.628488745 9.3.6.241.38160 > 9.3.6.244.23: . ack 49 win 65535 [tos 0x1
 - Press Ctrl-C to stop the output display:
^C38 packets received by filter 0 packets dropped by kernel
 
Useful data can be gained from the tcpdump output simply by recognizing the source IP addresses in the traffice (shown in bold type in the sample output). Thus, the sample output shows that ent2 is
 physically attached to the wrong network segment. The source IP 
addressses should be in the 9.2.6.22x range, not the 9.3.6.24x range. It
 is possible that swapping the cables for ent2 and ent3 may
 solve the problem. If not, you may need to ask your network 
administrator to reconfigure switch ports to pass the correct traffic. 
With the information you gain from using the netstat -v and tcpdump tools, you can better decide which action is most appropriate.
AIX provides many tools for querying TCP/IP status on AIX servers. However, the netstat and tcpdump commands
 do provide some methods for quick problem determination. For example, 
these tools can help determine if you own the problem or if it needs to 
be addressed by a network administrator.
For additional information, please refer to AIX Online Documents at the following URL:
Link
No comments:
Post a Comment