Troubleshooting Network Related Issues

This post will discuss about troubleshooting the network related issues. Once you read this post and understand, it’s very easy to apply these steps when you are working with network troubleshooting issues.

Eight steps to successful troubleshooting of networking issues:

1. Identify the exact issue or problem.

2. Recreate the problem if possible.

3. Localize and Isolate the cause.

4. Formulate a plan for solving the problem.

5. Implement the plan.

6. Test to verify that the problem has been resolved.

7. Document the problem and solution.

8. Provide feedback to the user.

Step 1. Identify the exact issue.

Defining the scope of the problem and deciding on the exact issue is important. Have the person who reported the problem explain how normal operation appears, and then demonstrate the perceived problem. If the reported issue is described as intermittent, instruct the user to contact you immediately if it ever happens again. It is very difficult to fix something that is clearly working just fine right now.

Do not discount what the user reports simply because it sounds implausible. The user does not have your knowledge of networking, and is probably describing the problem poorly. Something annoyed the user enough to contact you.

Note: Has it ever worked? If the reported failure has never worked properly, then treat the situation as a new installation and not a troubleshooting event. The process and assumptions are completely different.

Step 2. Recreate the problem.

Ask yourself if you understand the symptoms, and verify the reported problem yourself if possible. Problems are much easier to solve if they can be recreated on demand. Seeing the problem will allow you to observe error messages and various symptoms the user may not think important to relate, and may even provide the opportunity for you to collect network statistics during the event.

If the problem is intermittent, instruct the user what sort of symptoms are likely and provide a written list of what questions you are seeking answers to so the user can gather some of the information if you are unable to respond quickly enough to see it yourself. When possible, leave a diagnostic tool to gather information continuously. A protocol analyzer may be left gathering all traffic from the network and overwriting the buffer as it fills. Have the user halt its operation and/or store the current test results from other testers immediately upon rediscovering an intermittent problem.

Step 3. Localize and isolate the cause.

Once you have defined the problem, and recreated it if necessary, you should attempt to isolate that problem to a single device, connection, or software application. Reducing the scope of the problem in this way is where divide-and-conquer begins; the goal is to isolate the problem to the smallest element that could cause the problem. Test for and eliminate as many variables as possible. You may need to scan for a virus at this point.

Is there any normal function missing, or is there an abnormal response? Use the data gathered by your network monitoring tools to aid you in this process.

Determine whether anything was altered at that station or on the network just before the problem started. Often the user does not realize that changing something seemingly unrelated can cause problems on the network, such as rearranging the location of a portable heater or photocopier, or installing a new software application or adapter card. Do not discount the local environment when you are looking for change. Temperature changes (heat is often a problem), electrical use from adjacent spaces – including nearby businesses, time of day, and influences from electronic sources. Even the passage of an elevator, or use of a cordless phone, should be noted.

Can the problem be duplicated from another station, or using other software applications at the same station? Identify whether the problem is limited to one station, or one network resource such as a printer. Move one segment closer to the network resource and try again. If the problem goes away when you move closer to the network resource, test or replace the intervening infrastructure equipment.

If the problem affects an entire shared media segment, isolate the problem by reducing the variables to the fewest possible number. Try shortening the cable segment on a bus topology, or temporarily re-cabling a ring or star topology to create the smallest possible network for troubleshooting purposes. Try a different switch or hub. If the problem is on the same, shared media segment as the network resource, try turning off or disconnect all but two stations. Once those two are communicating, add more stations. If they are not communicating, check the physical layer possibilities such as the termination of the cable, the cable itself, or the specific ports used on the infrastructure equipment (hubs and switches).

If the problem can be isolated to a single station, try a different network adapter, a fresh copy of the network driver software (without using any of the network software or configuration files presently found on that station – delete them if necessary). Try accessing the network using a diagnostic tool from the existing network cable connection for that station. If the network connection seems intact, determine if only one application exhibits the problem. Try other applications from the same drive or file system. Compare configurations with a nearby but operational workstation. Try a fresh copy of the application software (again using none of the existing software or configuration files).

If only one user experiences the problem, check the network security and permissions for that user. Find out if any changes have been made to the network security that might affect this user. Has another user account been deleted that this user was made security equivalent to? Has this user been deleted from a security grouping within the network? Has an application been moved to a new location on the network? Have there been any changes to the system login script, or the user’s login script? Compare this user’s account with another user’s account that is able to perform the desired task. Have the affected user log in and attempt the same task from a nearby station that is not experiencing the problem. Have the other user log in to the problem station and try the same task.

Step 4. Formulate a plan for solving the problem.

Once a single operation, application or connection is localized as the source of the problem, research and/or consider the possible solutions to the problem. Consider the possibility that some solutions to the problem at hand may introduce other problems.

Note 1: To avoid unwanted repetition, and to make it possible to “back-out” any changes made should things get worse, be sure to carefully and completely document all actions taken during the problem resolution process. Copy all configuration files to a safe place before modifying them – especially on switches, routers, firewalls, and other key network infrastructure devices.

Note 2: It is advantageous to open a second terminal session into the switch or router where the commands required to reverse a configuration change are typed in and ready to execute prior to actually implementing the change in the first window. This is likely the fastest way to recover from changes that adversely affect your network.

Step 5. Implement the plan.

Your actual solution to the problem may be replacing a network device, NIC, cable, or other physical component. If the problem is software, you may have to implement a software patch, reinstall the application or component or clean a virus infected file. If the problem is the user account, the user’s security settings or logon scripts may need to be adjusted.

For network hardware, it is most expedient to simply replace a part, and attempt to repair the part later. Another option is to change the connection to a spare port and cover or otherwise mark the suspect port. Remember than the goal is to restore full operation of the network as soon as possible.

Two avenues exist for solving software problems. The first option is to reinstall the problem software, eliminating possibly corrupted files and ensuring that all required files are present. This is an excellent way to ensure that the second option – reconfiguringthe software – works on the first try. Many applications allow for a software switch that tells the installation program to disregard any existing configuration files, which is a good way to avoid being misled by the error and duplicating it yet again. If this option is not evident, then it is often better to remove the application before reinstalling it.

If the problem is isolated to a single user account it is often faster to repeat the steps necessary to grant the user access to the problem application or operation as if the user had never been authorized before. By going through each of these steps in a logical order, you will probably locate the missing or incorrect element faster than by spot-checking. In some situations, it may be expedient to simply delete the whole account and start over.

Step 6. Test to verify that the problem has been resolved.

After you have implemented the solution, ensure that the entire problem has been resolved by having the user test for the problem again. Also, have the user quickly try several other normal opera­tions with the equipment. It is not unheard of for a solution to one problem to cause other problems, and sometimes whatever was repaired turns out to be a symptom of another underlying problem.

Step 7. Document the problem and solution.

Documentation is useful for several reasons. First, documentation can be used for future reference to help you troubleshoot the same or similar problem. You can also use the documentation to prepare reports on common network problems for management and/or users, or to train new network users or members of the network support team.

Step 8. Provide feedback to the user.

There is often the temptation to fix the problem and leave. However, if a network user reported the problem they will appreciate knowing what happened. This will encourage them to report similar situations in the future, which will improve the performance of your network. Another reason for feedback is that if the user could have done something to correct or avoid the issue, it may reduce the number of future network problems.

A good working relationship between network support staff and the user community can significantly enhance your ability to keep the network running smoothly. Failure to take users seriously, or making unprofessional and condescending remarks can cause adversarial relations to develop, and can undermine your ability to do your job.

There is also a saying that 75% of fixing a problem is “fixing the user.” If the user does not agree that the problem has been taken to its conclusion (whether the problem has been corrected, or you have explained to the user’s satisfaction that a fix is impossible for the following technical, financial, or political reasons…), then you have not ended this support issue.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>