Know your network... before you ask someone Wednesday, May 28, 2008 |
- Start with your routers/firewalls
- Understand your subnet(s).
- Understand your VPNs.
- Take a stock of your IP addresses - for each server/other boxes
- Install cfg2html (www.cfg2html.com)
- run cfg2html - this'll extract all that info. about your servers
- Spend time and read through the same - in detail. Now you'll have more relevant questions to ask.
- Know the resource limitations of each of your servers
- Know what services & applications are running on which servers
- Find the init-startup/configuration files relevant to the running services / applications
- Draw the dependency tree between servers / lans / vpns
- Decipher & Visualize the gateways and routing tables on each of the servers
- Get all the possible access credentials to the routers/switches/servers/boxes
- Understand the physical network (if you have direct access) & take a firsthand physical inventory in-spite of whatever documentation you may be given
- Study the connectivity between servers to switches and other devices
- Study the Power Supply / UPS & any data connectivity to other devices
- Study the physical connectivity of cables between boxes to switches (data ports)
- Have a set of basic troubleshooting tools handy - system & network tools
- Make sure syslog is enabled and all system related info. is logged appropriately
- Know the log locations of all your services/apps running in the servers - this is the first place you'd look to find any anomalies
- Use log analyzers - and make it a practice/daily routine to check your logs daily
- Perform weekly health checkups across all the boxes
- Most importantly, study the backup/restore policies and know by hand what is being backed-up and what is not
- Make it a part of your daily routine to check the storage spaces across boxes
- Know your crons - whats running where
- Make sure your servers use uniform versions of packages/softwares
- Keep your servers clean - perform a sweep at-least once a fortnight to remove unwanted files from the servers
- Know your filesystem - what is where for quick access - keep the structure simple and less cluttered
- Maintain daily activity logs
- Configure your servers to save the history to a file - will be of mighty help in times of disasters
- Secure your boxes - with proper user management
- Practice to use Sudo
- Know all the alternate remote access methods to get on to a server in case if ssh fails or with network issues (ex. IPKvm, iLo2 etc.)
- Golden rule : Never Panic when put in a high pressure situation. It will only get things worse.
- Learn to use common sense. :) this is the most difficult part! Unfortunately, it isn't taught in any schools but only at the university of life.
- Research and put in place monitoring tools of your choice and comfort. You should have a dashboard / master console, that provides you a periodical feeler about what is happening within your tech. ecosystem.
- Employ watchdog monitors/tools like monit, to automate fail-over recoveries in case of critical services