C# Web Page Link Audit Tool

HP laid off a bunch of us August 25, 2008. C#/.NET was in demand here in the Boise area, and elsewhere. While I didn't have enough C# experience to get a job where I live, I was able to get a job in central Oregon (until, until August 14, 2009). After the HP layoff, I spent time learning C#, and I was able to use that knowledge to get the central Oregon job. I thought you might find what I have learned of interest.

Now that I am out of work again, I'm putting some of what I have learned on the job to work improving this tool.

Unlike the Plotting Toy, this is actually useful. You can use this to examine web pages for broken links. You enter a web page from the File menu, and it goes through the web page, testing every link for whether it is broken or working. If a link goes to a web page at the same directory level as the first page you entered, this tool recursively descends, checking all of the links on those pages. Links that are to pages that aren't HTML (or at least, are obviously not HTML) are only checked to make sure that we can open them.

You can print out the results. I'm going to make this smarter with time.

I have now added a check box on the open command that lets you tell it to not recurse. It will check the specified web page for broken links, but it won't audit the pages that this links to in the current or inferior directories.

Install it from here.

Pretty obviously, there are tools out there that do this (and a lot more). But it was nice to build something that I actually needed, while learning C#. It took me about three days.

There's a lot that still needs improving and polishing, of course.

2008-09-10: Today's improvements: there is now a Page Setup command, as well as the Print Links command.

2008-09-12: Pretty busy today. The latest changes are to use the TableLayoutPanel to set up the dialog windows, and generally make them a bit prettier. The code for figuring out the proper client size after all the objects have been added could be factored into a single function call that you pass the control to--and I will probably work on that for tomorrow.

2008-09-13: I replaced the individual lines to figure out maxX and maxY with calls to a common function.

2009-09-09: It now handles the weird HTML comments, the named type feature used for inserting characters outside the x'21' to x'7E' range, and a few other situations a bit better, especially with respect to reporting broken links.

Here's the source code.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Net;
using System.Drawing.Printing;
using System.Diagnostics;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public const int maxCustomers = 50;
        public PrintDocument printDocument1 = new PrintDocument();
        // The page settings used by the Page Setup command.
        private PageSettings pgSettings = new PageSettings();

        public Form1()
        {
            InitializeComponent();
            this.printDocument1.PrintPage += new PrintPageEventHandler(this.printDocument1_PrintPage);
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }

        List webPageResponds = new List();

        /// 
        /// Verify that we can open a connection to the requested web page, and read a few characters.
        /// Since it takes a long time to open a web connection, we will keep track of which URLs we 
        /// have already validated, and not reopen them.
        /// 
        /// the URL of the web page to verify that it exists
        /// true: this web page exists; false; no, it doesn't
        private bool validWebPage(string url)
        {
            // Have we already checked to see if this page responds?  No need to do it again.
            if (webPageResponds.Contains(url))
                return (true);
            // Keep track of which pages we have already checked.
            webPageResponds.Add(url);
            // used to build entire input
            StringBuilder sb = new StringBuilder();
            string tempString;
            // used on each read operation
            byte[] buf = new byte[50];
            try
            {
                urlName.Text = "opening " + url;
                // Force processing of events by the application; this causes a refresh of the urlName field.
                Application.DoEvents();
                Debug.WriteLine("validating " + url);
                HttpWebRequest webpage = (HttpWebRequest)WebRequest.Create(url);
                HttpWebResponse resp = (HttpWebResponse)webpage.GetResponse();
                Stream resStream = resp.GetResponseStream();
                int count = 0;
                do
                {
                    // fill the buffer with data
                    count = resStream.Read(buf, 0, buf.Length);
                    // make sure we read some data
                    if (count != 0)
                    {
                        Application.DoEvents();
                        // translate from bytes to ASCII text
                        tempString = Encoding.ASCII.GetString(buf, 0, count);
                        // continue building the string
                        sb.Append(tempString);
                    }
                }
                while (count > 0); // any more data to read?
                urlName.Text = "";
                Application.DoEvents();
            }
            catch (System.Net.WebException)
            {
                // Didn't respond, fail
                return false;
            }
            catch (System.IO.IOException ex)
            {
                // Something unknown failed.
                MessageBox.Show("connection lost " + url + " because of " + ex.ToString(), "Web Problem");
                return false;
            }
            catch (System.UriFormatException)
            {
                // Unlawful URL format.
                //MessageBox.Show("invalid URL \"" + url + "\"", "Web Problem");
                return false;
            }
            return (true);
        }

        /// 
        /// Reads the requested web page and returns it as a string.
        /// 
        /// the web page to load
        /// the web page as a string
        private string ReadWebPage(string url)
        {
            // used to build entire input
            StringBuilder sb = new StringBuilder();
            // used on each read operation
            byte[] buf = new byte[8192];
            try
            {
                urlName.Text = "reading " + url;
                Application.DoEvents();
                Debug.WriteLine("reading " + url);
                HttpWebRequest webpage = (HttpWebRequest)WebRequest.Create(url);
                HttpWebResponse resp = (HttpWebResponse)webpage.GetResponse();
                Stream resStream = resp.GetResponseStream();
                string tempString = null;
                int count = 0;
                do
                {
                    // fill the buffer with data
                    count = resStream.Read(buf, 0, buf.Length);
                    // make sure we read some data
                    if (count != 0)
                    {
                        Application.DoEvents();     // force update of the application
                        // translate from bytes to ASCII text
                        tempString = Encoding.ASCII.GetString(buf, 0, count);
                        // continue building the string
                        sb.Append(tempString);
                    }
                }
                while (count > 0); // any more data to read?
                urlName.Text = "";
                Application.DoEvents();
            }
            catch (System.Net.WebException)
            {
                //MessageBox.Show("Couldn't open " + url + ex.ToString(), "Web Problem");
                return null;
            }
            catch (System.UriFormatException)
            {
                // Unlawful URL format.
                //MessageBox.Show("invalid URL \"" + url + "\"", "Web Problem");
                return null;
            }
            return (sb.ToString());
        }

        /// 
        /// See if this key (case insensitive) exists in this string, starting at the specified location
        /// 
        /// string to search
        /// where to start searching in stringToSearch
        /// the string to search for
        /// true: found it; false: not there
        private bool matchString(string stringToSearch, int index, string key)
        {
            bool result = false;

            if (key.Length <= stringToSearch.Length - index)
                result = stringToSearch.Substring(index, key.Length).Equals(key, StringComparison.OrdinalIgnoreCase);
            return (result);
        }

       /// 
        /// Insert a new HtmlNode (if needed) and returns the index number of the following node.
        /// 
        /// the List of HtmlNodes to which we are adding (maybe)
        /// the HtmlNode to add
        /// where to insert the new node
        private void insertNewHtmlNode(List htmlArray, HtmlNode newNode,
                                        int indexOfOldNode)
        {
            // If there is nothing left of the new node, we don't need to take this step.
            if (htmlArray[indexOfOldNode].data.Length > 0)
            {
                // Move everything up.  (This really does need a linked list.)
                HtmlNode newHigh = htmlArray[indexOfOldNode];
                htmlArray.Add(newHigh);
                for (int x = htmlArray.Count-1; x > indexOfOldNode; x--)
                    htmlArray[x] = htmlArray[x-1];
                htmlArray[indexOfOldNode] = newNode;
                indexOfOldNode++;
            }
            else
                htmlArray[indexOfOldNode] = newNode;
        }
 
        /// 
        /// Skip white space in a string, starting at the specified index.
        /// 
        /// string to search for non-white space
        /// where to start searching
        /// the index of the first non-white space character in foodSource
        private int EatWhiteSpace (string foodSource, int index)
        {
            // Make sure that we are still within the string.
            if (index > foodSource.Length || index < 0)
                return (-1);
            while ((index < foodSource.Length) && (foodSource[index] == ' ' || foodSource[index] == '\r' ||
                   (foodSource[index] == '\n' || foodSource[index] == '\t')))
                index++;
            return (index);
        }

        /// 
        /// Eats through one occurrence of the desired string, consuming any white space before or after.
        /// 
        /// the string we are searching through
        /// the starting index in foodSource
        /// the string to consume
        /// index of first character after food
        private int Eat1(string foodSource, int index, string food)
        {
            // Make sure that we are inside a valid string!
            if (index > foodSource.Length || index < 0)
                return (-1);
            index = EatWhiteSpace(foodSource, index);
            if (index < foodSource.Length)
            {
                char next = foodSource[index];
                if (((foodSource.Length - index) > food.Length) &&
                    foodSource.Substring(index, food.Length).Equals(food, StringComparison.OrdinalIgnoreCase))
                    index += food.Length;
                index = EatWhiteSpace(foodSource, index);
            }
            return (index);
        }

        /// 
        /// Eats through one quote (double or single), consuming any white space before or after.
        /// 
        /// input string
        /// where to start eating
        /// new index
        private int EatQuote(string foodSource, int index)
        {
            // Make sure that we are inside a valid string!
            if (index > foodSource.Length || index < 0)
                return (-1);
            index = EatWhiteSpace(foodSource, index);
            if (index < foodSource.Length)
            {
                char next = foodSource[index];
                if (next == '"' || next == '\'')
                    index++;
                index = EatWhiteSpace(foodSource, index);
            }
            return (index);
        }

        /// 
        /// Skip any leading white space, grab the string contained in quotes (double or single), and give it to the 
        /// caller.  
        /// 
        /// string to grab quoted string from
        /// starting search index in foodSource
        /// where to put the output string
        /// index of first character after quoted string
        private int EatQuotedString(string foodSource, int index, ref string outputString)
        {
            if (index < 0 || index > foodSource.Length)
                return (-1);
            index = EatWhiteSpace(foodSource, index);
            index = EatQuote(foodSource, index);
            StringBuilder tempString = new StringBuilder(100);
            while ((index < foodSource.Length) && (foodSource[index] != '"' && foodSource[index] != '\''))
                tempString.Append(foodSource[index++]);
            index = EatQuote(foodSource, index);
            outputString = tempString.ToString();
            return (index);
        }

        /// 
        /// Extract a string from a HtmlNode, putting it into a new HtmlNode
        /// 
        /// HtmlNode to receive the string
        /// HtmlNode that is the source
        /// starting index of the source string
        /// ending index of the source string
        public void extractHtml(HtmlNode insHtmlNode, HtmlNode htmlNode, int start, int end)
        {
            if (end - start > 0)
                insHtmlNode.data = htmlNode.data.Substring(start, end);
            htmlNode.data = htmlNode.data.Substring(end);
        }

        /// 
        /// Maps named entities to the corresponding byte value
        /// 
        private static Dictionary namedEntities = new Dictionary
        {
            {"amp", (short)'&'},
            {"nbsp", 160},
            {"cent", 162},
            {"pound", 163},
            {"curren", 164},
            {"yen", 165},
            {"brvbar", 166},
            {"sect", 167},
            {"uml", 168},
            {"copy", 169},
            {"ordf", 170},
            {"laquo", 171},
            {"not", 172},
            {"shy", 173},
            {"reg", 174},
            {"macr", 175},
            {"deg", 176},
            {"plusmn", 177},
            {"sup2", 178},
            {"sup3", 179},
            {"acute", 180},
            {"micro", 181},
            {"para", 182},
            {"middot", 183},
            {"cedil", 184},
            {"sup1", 185},
            {"ordm", 186},
            {"raquo", 187},
            {"frac14", 188},
            {"frac12", 189},
            {"frac34", 190},
            {"iquest", 191},
            {"Agrave", 192},
            {"Aacute", 193},
            {"Acirc", 194},
            {"Atilde", 195},
            {"Auml", 196},
            {"Aring", 197},
            {"AElig", 198},
            {"Ccedil", 199},
            {"Egrave", 200},
            {"Eacute", 201},
            {"Ecirc", 202},
            {"Euml", 203},
            {"Igrave", 204},
            {"Iacute", 205},
            {"Icirc", 206},
            {"Iuml", 207},
            {"Ntilde", 209},
            {"Ograve", 210},
            {"Oacute", 211},
            {"Ocirc", 212},
            {"Otilde", 213},
            {"Ouml", 214},
            {"times", 215},
            {"Oslash", 216},
            {"Ugrave", 217},
            {"Uacute", 218},
            {"Ucirc", 219},
            {"Uuml", 220},
            {"Yacute", 221},
            {"szlig", 223},
            {"agrave", 224},
            {"aacute", 225},
            {"acirc", 226},
            {"atilde", 227},
            {"auml", 228},
            {"aring", 229},
            {"aelig", 230},
            {"ccedil", 231},
            {"egrave", 232},
            {"eacute", 233},
            {"ecirc", 234},
            {"euml", 235},
            {"igrave", 236},
            {"iacute", 237},
            {"icirc", 238},
            {"iuml", 239},
            {"eth", 240},
            {"ntilde", 241},
            {"ograve", 242},
            {"oacute", 243},
            {"ocirc", 244},
            {"otilde", 245},
            {"ouml", 246},
            {"divide", 247},
            {"oslash", 248},
            {"ugrave", 249},
            {"uacute", 250},
            {"ucirc", 251},
            {"uuml", 252},
            {"yacute", 253},
            {"thorn", 254},
            {"yuml", 255}
        };

        /// 
        /// Extract all the HTML out of a web page, returning it as a series of HtmlNodes.  Any 
        /// HTML comments and white space are consumed and ignored.
        /// 
        /// the web page as a string
        /// a List of html objects
        private List parseHtml(string webPage)
        {
            // We're going to return this as a list of HtmlNodes.
            List htmlArray = new List();
            HtmlNode htmlNode = new HtmlNode();

            // First step: find all the HTML operations (contained in < >).
            StringBuilder dataString = new StringBuilder(600);
            int htmlDepth = 0;
            for (int i = 0; i < webPage.Length; i++)
            {
                Application.DoEvents();
                switch(webPage[i])
                {
                    case '<':
                        htmlDepth++;
                        i++;    // skip the < 
                        // Skip leading blanks.
                        while (webPage[i] == ' ' && i < webPage.Length)
                            i++;
#if DEBUG
                        if (htmlDepth > 1)
                            Debug.WriteLine("< inside HTML");
#endif
                        // A special case: the empty HTML comment definition
                        if (webPage[i] == '!' && webPage[i + 1] == '>')
                            i++;
                        else if (webPage[i] == '!' && webPage[i+1] == '-' && webPage[i + 2] == '-')
                        {
                            // Start of an HTML comment; ignore characters until we hit -->
                            i += 3;
                            while (i + 3 < webPage.Length)
                            {
                                if (webPage[i] == '-' && webPage[i + 1] == '-' && webPage[i + 2] == '>')
                                {
                                    i += 2;
                                    break;
                                }
                                else
                                    i++;
                            }
                        }

                        if (dataString.Length > 0)
                        {
                            // Complete the last node.
                            htmlNode.data = dataString.ToString();
                            htmlArray.Add(htmlNode);
                            dataString = new StringBuilder(600);
                            dataString.Append(webPage[i]);
                            // Start the next node, which we know is HTML.
                            htmlNode = new HtmlNode(true, null);
                        }
                        else
                        {
                            htmlNode.html = true;
                            dataString.Append(webPage[i]);
                        }
                        break;

                    case '>':
#if DEBUG
                        if (htmlDepth == 0)
                            Debug.WriteLine("> outside HTML");
#endif
                        htmlDepth--;
                        if (htmlDepth < 0)
                        {
#if DEBUG
                            Debug.WriteLine("Mismatched <>");
#endif
                            htmlDepth = 0;
                        }
                        if(dataString.Length > 0)
                        {
                            // Complete the last node.
                            htmlNode.data = dataString.ToString();
                            htmlArray.Add(htmlNode);
                            dataString = new StringBuilder(600);
                            // Start the next node, which we know is not HTML.
                            htmlNode = new HtmlNode(false, null);
                        }
                        else
                            dataString.Append(webPage[i]);
                        break;
                
                    default:
                        if (webPage[i] == '&')
                        {
                            // Okay, time to extract a named entry string and see if it matches.
                            int semiColonIndex = webPage.Substring(i).IndexOf(';');
                            short namedEntityValue;
                            string namedEntityString;
                            // No closing semicolon--this can't be a named entry string, treat the & normal.
                            if (semiColonIndex != -1)
                            {
                                if (webPage[i + 1] == '#')
                                {
                                    namedEntityString = webPage.Substring(i + 2, semiColonIndex - 2);
                                    if (Int16.TryParse(namedEntityString, out namedEntityValue))
                                    {
                                        // Must be an &ddd; string.
                                        dataString.Append((char)namedEntityValue);
                                        i += semiColonIndex;
                                    }
                                }
                                else
                                {
                                    namedEntityString = webPage.Substring(i + 1, semiColonIndex - 1);
                                    // Must be a named string.  Try and look it up.
                                    if (namedEntities.TryGetValue(namedEntityString, out namedEntityValue))
                                    {
                                        // Skip past this string.
                                        i += semiColonIndex;
                                        dataString.Append((char)namedEntityValue);
                                    }
                                    else
                                        // have no idea what this is; ignore it
                                        dataString.Append(webPage[i]);
                                }

                            }
                            else
                                dataString.Append(webPage[i]);
                        }
                        else
                            dataString.Append(webPage[i]);
                        break;
                }
            }
            // Second step: go through all the HTML, and try to convert these into individual
            // HTML operations.
            string key;
            for (int i = 0; i < htmlArray.Count; i++)
            {
                Application.DoEvents();
                if (htmlArray[i].html)
                {
                    htmlNode = htmlArray[i];
                    int j = 0, startOfKey = 0;
                    // We might get some white space (typical \r\n) after one HTML component, but 
                    // before another.  Dispose of it.
                    for (startOfKey = j; (j < htmlNode.data.Length) &&
                         (htmlNode.data[j] == ' ' || htmlNode.data[j] == '\r' ||
                         htmlNode.data[j] == '\n' || htmlNode.data[j] == '\t'); j++)
                        ;
                    if (j > startOfKey)
                    {
                        // Must have found some white space.  Remove it.
                        htmlNode.data = htmlNode.data.Substring(startOfKey, j);
                    }
                    if (matchString(htmlNode.data, j, key = "A "))
                    {
                         // I think this means Attribute.  In practice, it means nothing for us 
                         // with respect to link validation.
                         HtmlNode insHtmlNode = new HtmlNode(true, key);
                         // At this point, there are two choices: we have completely inhaled the 
                         // old HTML data, in which case this node replaces the existing node, 
                         // or there is still stuff here, in which case we insert the new node before 
                         // the old one.  This should probably be done with a linked list later, but 
                         // one thing at a time!
                         htmlNode.data = htmlNode.data.Substring(j + key.Length);
                         insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "NAME"))
                    {
                        // Defines a location within a web page, usually referenced by #name in an HREF
                        // Figure out where the name starts and ends
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        startOfKey = j;
                        j += key.Length;
                        j = Eat1(htmlNode.data, j, "=");
                        string nameString = null;
                        j = EatQuotedString(htmlNode.data, j, ref nameString);
                        insHtmlNode.parameter = nameString;
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "HREF"))
                    {
                         // Defines where to go to next if someone clicks a link.
                         // This can be one of three references: a fully qualified URL
                         // (starting with http:// or https://, a relative reference (a URL relative 
                         // to the current URL directory level), or a symbolic reference, 
                         // which always starts with a #.
                         HtmlNode insHtmlNode = new HtmlNode(true, key);
                         startOfKey = j;
                         j += key.Length;
                         j = Eat1(htmlNode.data, j, "=");
                         string hrefString = null;
                         j = EatQuotedString(htmlNode.data, j, ref hrefString);
                         insHtmlNode.parameter = hrefString;
                         // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                         extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                         insertNewHtmlNode(htmlArray, insHtmlNode, i);
                     }
                    else if (matchString(htmlNode.data, j, key = "LINK REL"))
                    {
                        // Defines what type of link this is.
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        startOfKey = j;
                        j += key.Length;
                        j = Eat1(htmlNode.data, j, "=");
                        string linkRelString = null;
                        j = EatQuotedString(htmlNode.data, j, ref linkRelString);
                        insHtmlNode.parameter = linkRelString;
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "LINK"))
                    {
                        // Defines what type of link this is.
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        startOfKey = j;
                        j += key.Length;
                        j = Eat1(htmlNode.data, j, "=");
                        string linkRelString = null;
                        j = EatQuotedString(htmlNode.data, j, ref linkRelString);
                        insHtmlNode.parameter = linkRelString;
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "SCRIPT TYPE"))
                    {
                        // Defines what type of link this is.
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        startOfKey = j;
                        j += key.Length;
                        j = Eat1(htmlNode.data, j, "=");
                        string linkRelString = null;
                        j = EatQuotedString(htmlNode.data, j, ref linkRelString);
                        insHtmlNode.parameter = linkRelString;
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "TYPE"))
                    {
                        // Defines what type of file this is.
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        startOfKey = j;
                        j += key.Length;
                        j = Eat1(htmlNode.data, j, "=");
                        string typeString = null;
                        j = EatQuotedString(htmlNode.data, j, ref typeString);
                        insHtmlNode.parameter = typeString;
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "BODY LANG"))
                    {
                        // Defines what type of file this is.
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        startOfKey = j;
                        j += key.Length;
                        j = Eat1(htmlNode.data, j, "=");
                        string langString = null;
                        j = EatQuotedString(htmlNode.data, j, ref langString);
                        insHtmlNode.parameter = langString;
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "CENTER"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "H1"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/H1"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/A"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "BLOCKQUOTE"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/BLOCKQUOTE"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "META HTTP-EQUIV=\"REFRESH\""))
                    {
                        startOfKey = j;
                        j += htmlNode.data.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "BODY"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/BODY"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "B"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/B"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "I"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/I"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "P"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                    else if (matchString(htmlNode.data, j, key = "/P"))
                    {
                        startOfKey = j;
                        j += key.Length;
                        HtmlNode insHtmlNode = new HtmlNode(true, key);
                        // Extract the HTML from htmlNode and stuff it in insHtmlNode.
                        extractHtml(insHtmlNode, htmlNode, startOfKey, j);
                        insertNewHtmlNode(htmlArray, insHtmlNode, i);
                    }
                }
            }
            return (htmlArray);
        }

        // 
        /// 
        /// Validate an absolute URL.  We have this broken out because we have a couple of places below 
        /// where we perform  the same steps, and it would be best to get a uniform way of doing it.  In 
        /// addition, there are some classes of link that we probably don't want to read in full, because 
        /// they aren't HTML, and can't be verified, such as PDF and JPG files.
        /// 
        /// the fully qualified link to validate
        /// the link as it appeared in the web page
        /// the URL of the source page for this link
        /// the directory from which this web page came
        private void validateAbsoluteLink(string link, string unadornedLink, string srcPage, string webPageDir)
        {
            if (!pageChecked.Contains(link))
            {
                // Figure out if the link is working or broken by trying to read it.
                if (validWebPage(link))
                {
                    AddWorkingLink(link, unadornedLink, srcPage);
                    // Don't bother to recurse on JPG, GIF, or PDF files.
                    if (link.EndsWith(".JPG", StringComparison.OrdinalIgnoreCase) ||
                        link.EndsWith(".GIF", StringComparison.OrdinalIgnoreCase) ||
                        link.EndsWith(".PDF", StringComparison.OrdinalIgnoreCase))
                        ;
                    else
                    {
                        // Okay, it worked, let's try and work our way downward, if this
                        // is within this domain.  To avoid loops, we won't do this to self.
                        if (link.Length > webPageDir.Length &&
                            link.Substring(0, webPageDir.Length).Equals(webPageDir) &&
                            0 != String.Compare(link, srcPage) && 
                            recursePages)
                            validateWebPage(link, srcPage);
                    }
                }
                else
                    AddBrokenLink(link, unadornedLink, srcPage);
            }
        }

        /// 
        /// Add a broken link to the appropriate list.
        /// 
        /// the link we couldn't find
        /// the link as it appeared in the original page
        /// the page where this link appears
        private void AddBrokenLink(string link, string unadornedLink, string srcPage)
        {
            ListViewItem item = new ListViewItem();
            item.Text = unadornedLink;
            item.SubItems.Add(srcPage);
            this.brokenLinksView.Items.Add(item);
        }

        /// 
        /// Add a broken link to the appropriate list.
        /// 
        /// the link we could find
        /// the link as it appeared in the original page
        /// the page where this link appears
        private void AddWorkingLink(string link, string unadornedLink, string srcPage)
        {
            ListViewItem item = new ListViewItem();
            item.Text = unadornedLink;
            item.SubItems.Add(srcPage);
            this.workingLinksView.Items.Add(item);
        }

        List nameChecked = new List();

        /// 
        /// Search a specified web page for a specified label.  We can't just search through the 
        /// unparsed text, because of the range of ways the name parameter could be set up.
        /// 
        /// the page from which the link came
        /// the page containing the label
        /// the label
        private void validateWebPageForName(string srcWebPage, string link, string name, string unadornedLink)
        {
            bool found = false;

            // If we have already checked for the existence of this name, don't bother doing it again.
            if (nameChecked.Contains(link + "#" + name))
                return;
            nameChecked.Add(link + "#" + name);

            // See if the name reference is contained in the specified web page.
            // Read in the page
            string pageToSearch = ReadWebPage(link);
            if (pageToSearch == null)
            {
                // Probably means that we received an invalid URL of some sort.  Fail as 
                // though this is a bad link.
                ListViewItem item = new ListViewItem();
                item.Text = link;
                item.SubItems.Add(srcWebPage);
                this.brokenLinksView.Items.Add(item);
                return;
            }
            List htmlToSearch = new List();
            urlName.Text = "parsing " + link;
            Application.DoEvents();
            htmlToSearch = parseHtml(pageToSearch);
            urlName.Text = "searching " + link + " for name " + name;
            Application.DoEvents();
            for (int i = 0; !found && i < htmlToSearch.Count; i++)
            {
                if ((htmlToSearch[i].html) && 
                    (htmlToSearch[i].data.StartsWith("NAME", StringComparison.OrdinalIgnoreCase)) &&
                    (htmlToSearch[i].parameter.Equals(name)))
                            found = true;
            }
            if (found)
                AddWorkingLink(link, unadornedLink, srcWebPage);
            else
                AddBrokenLink(link, unadornedLink, srcWebPage);
        }

        static int recursionLevel = 0;
        bool recursePages;
        List pageChecked = new List();

        /// 
        /// Does this link contain a label reference?  If so, chop it up into the two parts.
        /// 
        /// the link that may contain a label reference
        /// a string[2] containing the file name and the label
        /// true: the link contained a label reference; false: no, it didn't
        private static bool ContainsLabelRef(string link, ref string[] linkAndName)
        {
            // This might be an absolute URL, or a URL with a name attached to it. 
            // Find out by searching for # in the link.
            int poundSignIndex = link.IndexOf("html#");
            if (poundSignIndex != -1)
                poundSignIndex += "html".Length;
            else
            {
                poundSignIndex = link.IndexOf("htm#");
                if (poundSignIndex != -1)
                    poundSignIndex += "htm".Length;
            }
            if (poundSignIndex != -1)
            {
                // Then this has a name reference.  Verify the absolute link 
                // first, then see if the name reference is there.
                linkAndName[0] = link.Substring(0, poundSignIndex);
                linkAndName[1] = link.Substring(poundSignIndex + 1);
            }
            else
            {
                linkAndName[0] = link;
                linkAndName[1] = String.Empty;
            }
            // If poundSignIndex is -1, then we didn't find a label reference, return false
            return (poundSignIndex != -1);
        }

        private void MalformedRefresh(string refreshCmd, string msg)
        {
            MessageBox.Show(msg + " in " + refreshCmd);
        }

        private void validateWebPage (string url, string srcPage)
        {
            // Make sure that there is a slash after the http:// or https:// prefix.
            int afterPrefix = url.IndexOf("http://");
            if (afterPrefix == -1)
            {
                afterPrefix = url.IndexOf("https://");
                if (afterPrefix == -1)
                {
                    MessageBox.Show("URL must start with http:// or https://");
                    return;
                }
                else
                    afterPrefix += "https://".Length;
            }
            else
                afterPrefix += "http://".Length;
            // Determine the directory in which this web page is located, in case we have 
            // references to other pages relative to this.
            string webPageDir;
            int firstSlashInDomainName = url.IndexOf('/', afterPrefix);
            if (firstSlashInDomainName == -1)
            {
                // No closing slash; add one in
                url += "/";
                // Set the webPageDir to the full domain name
                webPageDir = url;
            }
            else
                webPageDir = url.Substring(0, firstSlashInDomainName+1);

            // If this url ends with a slash, try and figure out whether it is index.html, index.htm that is intended.
            if (url.EndsWith("/"))
            {
                if (null != ReadWebPage(url + "index.html"))
                    url = url + "index.html";
                else
                    url = url + "index.htm";
            }
            // Record every page that we have validated, so that we don't waste time going 
            // back through them.
            pageChecked.Add(url);
            // Read the web page in.
            string webPage = ReadWebPage(url);
            urlName.Text = "validating " + srcPage;
            Application.DoEvents();
            if (webPage == null)
            {
                // Probably means that we received an invalid URL of some sort.  Fail as 
                // though this is a bad link.
                AddBrokenLink(url, url, srcPage);
                return;
            }
            // Unfortunately, HTML is a free form language, so I have to parse the 
            // HTML so that I can search for the parts that I care about.
            List htmlArray = parseHtml(webPage);
            string[] linkAndName = new string[2];   // used for parsing label references
            // We may be processing a URL relative to the base of the website; extract the base
            // We do this by finding the last slash before the htm
            string basePage;
            int endOfBasePage = url.LastIndexOf('/');
            if (endOfBasePage == -1)
                 basePage = url;
            else
                basePage = url.Substring(0, endOfBasePage);
            for (int i = 0; i < htmlArray.Count; i++)
            {
                if (htmlArray[i].html)
                {
                    if (matchString(htmlArray[i].data, 0, "HREF"))
                    {
                        // Determine what type of link this is.  If it starts with #, then it is a 
                        // name within the current file.  Search for A NAME="#name".
                        string link = htmlArray[i].parameter;
                        if (link.StartsWith("#"))
                        {
                            bool found = false;
                            for (int j = 0; !found && j < htmlArray.Count; j++)
                            {
                                if (htmlArray[j].html && matchString(htmlArray[j].data, 0, "NAME") &&
                                    htmlArray[j].data.Contains(link.Substring(1)))
                                {
                                    found = true;
                                    AddWorkingLink(url + link, link, srcPage);
                                }
                            }
                            if (!found)
                                AddBrokenLink(url + link, link, srcPage);
                        }
                        else if (link.ToUpper().StartsWith("HTTP://") || link.ToUpper().StartsWith("HTTPS://"))
                        {
                            if (ContainsLabelRef(link, ref linkAndName))
                            {
                                validateAbsoluteLink(linkAndName[0], link, url, webPageDir);
                                validateWebPageForName(url, linkAndName[0], linkAndName[1], link);
                            }
                            else
                                validateAbsoluteLink(link, link, url, webPageDir);
                        }
                        else if (link.StartsWith("/"))
                        {
                            // This might be an absolute URL, or a URL with a name attached to it. 
                            // Find out by searching for # in the link.
                            if (ContainsLabelRef(link, ref linkAndName))
                            {
                                validateAbsoluteLink(linkAndName[0], link, url, webPageDir);
                                validateWebPageForName(url, linkAndName[0], linkAndName[1], link);
                            }
                            else
                                validateAbsoluteLink(basePage + link, link, srcPage, webPageDir);
                        }
                        else if (link.StartsWith("javascript:") || link.StartsWith("mailto:"))
                            ;
                        else
                        {
                            if (ContainsLabelRef(link, ref linkAndName))
                            {
                                validateAbsoluteLink(basePage + "/" + linkAndName[0], link, url, webPageDir);
                                validateWebPageForName(url, basePage + "/" + linkAndName[0], linkAndName[1], link);
                            }
                            else
                                validateAbsoluteLink(basePage + "/" + link, link, srcPage, webPageDir);
                        }
                        this.Invalidate();
                    }
                    else if (matchString(htmlArray[i].data, 0, "META HTTP-EQUIV=\"REFRESH\""))
                    {
                        // This is a very common but non-HTML method of forcing a refresh.
                        int refreshUrlIndex = htmlArray[i].data.IndexOf("URL");
                        if (refreshUrlIndex == -1)
                            MessageBox.Show("Unrecognized refresh command: " + htmlArray[i].data);
                        else
                        {
                            // Find the URL
                            refreshUrlIndex = htmlArray[i].data.IndexOf('=', refreshUrlIndex);
                            if (refreshUrlIndex == -1)
                                MalformedRefresh(htmlArray[i].data, "missing =");
                            else
                            {
                                int urlEndIndex = htmlArray[i].data.IndexOf("\"", ++refreshUrlIndex);
                                if (urlEndIndex == -1)
                                    MalformedRefresh(htmlArray[i].data, "missing closing quote");
                                else
                                {
                                    string refreshUrl = htmlArray[i].data.Substring(refreshUrlIndex, urlEndIndex-refreshUrlIndex);
                                    validateAbsoluteLink(refreshUrl, refreshUrl, url, webPageDir);
                                }
                            }
                        }
                    }
                }
            }
            recursionLevel--;
        }

        // Which web page did we audit last?
        string lastUrl = null;

        private void webPageToAuditToolStripMenuItem_Click(object sender, EventArgs e)
        {
            // Record farthest down and right point in the panel we build.
            Point max = new Point(0, 0);
            // Bring up a web page to audit the status of the links.
            Form dlgWebPage = new Form();
            TableLayoutPanel panel = new TableLayoutPanel();
            panel.RowCount = 2;
            panel.ColumnCount = 3;
            panel.CellBorderStyle = TableLayoutPanelCellBorderStyle.Inset;
            Label title = new Label();
            title.AutoSize = true;
            title.Text = "What Web Page To Audit?";
            title.Font = new Font("Arial", 12, FontStyle.Bold, System.Drawing.GraphicsUnit.Point);
            panel.Controls.Add(title, 0, 0);
            maxCoord(ref max, title);
            TextBox box = new TextBox();
            box.Text = "http://www.claytoncramer.com/index.html";
            box.Width = 400;
            panel.Controls.Add(box, 1, 0);
            maxCoord(ref max, box);
            // Give the user the choice of not recursing.
            Label recurseTitle = new Label();
            recurseTitle.Text = "Search Recursively?";
            recurseTitle.AutoSize = true;
            panel.Controls.Add(recurseTitle, 0, 1);
            maxCoord(ref max, recurseTitle);
            CheckBox recurse = new CheckBox();
            recurse.Checked = true;
            panel.Controls.Add(recurse, 1, 1);
            maxCoord(ref max, recurse);
            // Add a button.
            Button bOk = new System.Windows.Forms.Button();
            bOk.Text = "OK";
            bOk.DialogResult = DialogResult.OK;
            bOk.Dock = DockStyle.Fill;
            panel.Controls.Add(bOk, 0, 2);
            maxCoord(ref max, bOk);
            Button bCancel = new System.Windows.Forms.Button();
            bCancel.Text = "Cancel";
            bCancel.Dock = DockStyle.Fill;
            bCancel.DialogResult = DialogResult.Cancel;
            panel.Controls.Add(bCancel, 1, 2);
            maxCoord(ref max, bCancel);
            panel.ClientSize = new System.Drawing.Size(max.X, max.Y);
            dlgWebPage.ClientSize = new System.Drawing.Size(panel.Size.Width, panel.Size.Height);
            this.AcceptButton = bOk;
            dlgWebPage.Controls.Add(panel);
            if (DialogResult.OK == dlgWebPage.ShowDialog())
            {
                recursePages = recurse.Checked;
                string url = box.Text;
                string lastUrl = url;
                if (url.Length > 0)
                {
                    this.workingLinksView.Clear();
                    this.brokenLinksView.Clear();
                    this.workingLinksView.Columns.Add("link", 400, HorizontalAlignment.Left);
                    this.workingLinksView.Columns.Add("in page", 400, HorizontalAlignment.Left);
                    this.brokenLinksView.Columns.Add("link", 400, HorizontalAlignment.Left);
                    this.brokenLinksView.Columns.Add("in page", 400, HorizontalAlignment.Left);
                    this.Invalidate();
                    validateWebPage(url, url);
                }
            }
            urlName.Text = "";
        }

        private void exitToolStripMenuItem_Click(object sender, EventArgs e)
        {
            Application.Exit();
        }

        bool titlePrinted = false;
        bool brokenHeaderPrinted = false;
        bool workingHeaderPrinted = false;
        int brokenLinksPrinted = 0;
        int workingLinksPrinted = 0;
        int leftMargin, topMargin, rightMargin, bottomMargin, linksMargin, lineHeight, nextLine;
        RectangleF pageSize;
        Font messageFont, titleFont, headerFont;
        bool brokenLinks, workingLinks;

        /// 
        /// What are the bounds of the printed page?  Also sets the lineHeight and nextLine fields for the 
        /// initial page printing.
        /// 
        /// the graphics context for the printed page
        /// the margin bounds for the page
        private void getPageLayout(Graphics g, Rectangle marginBounds)
        {
            // What is the size of the page?
            pageSize = g.VisibleClipBounds;
            leftMargin = (int)marginBounds.Left + 72;
            linksMargin = leftMargin + 72;
            topMargin = (int)marginBounds.Top + 72;
            bottomMargin = (int)marginBounds.Bottom - 72;
            rightMargin = (int)marginBounds.Right - 72;
            messageFont = new Font("Arial", 9, System.Drawing.GraphicsUnit.Point);
            titleFont = new Font("Arial", 12, FontStyle.Bold, System.Drawing.GraphicsUnit.Point);
            headerFont = new Font("Arial", 10, FontStyle.Bold, System.Drawing.GraphicsUnit.Point);
            lineHeight = messageFont.Height;
            nextLine = topMargin;
        }

        /// 
        /// Print on the next line.
        /// 
        /// printed page's graphics context
        /// the string to print
        /// what font to use
        /// where to print
        /// where to print
        private void printNextLine(Graphics g, string msg, Font font, int x, int y)
        {
            g.DrawString(msg, font, Brushes.Black, x, y);
        }

        private void printDocument1_PrintPage(object sender, System.Drawing.Printing.PrintPageEventArgs e)
        {
            Graphics g = e.Graphics;
            getPageLayout(g, e.MarginBounds);
            if (!titlePrinted)
            {
                string title = "link audit for web page " + lastUrl;
                SizeF titleSize = g.MeasureString(title, titleFont);
                printNextLine(g, "link audit for web page " + lastUrl, titleFont, 
                              leftMargin + ((rightMargin - leftMargin) - (int)titleSize.Width)/2, nextLine);
                nextLine += titleFont.Height;
                titlePrinted = true;
            }
            if (nextLine < bottomMargin)
            {
                if (brokenLinks)
                {
                    if (!brokenHeaderPrinted)
                    {
                        printNextLine(g, "broken links", headerFont, leftMargin, nextLine);
                        nextLine += headerFont.Height;
                        brokenHeaderPrinted = true;
                    }
                    if (nextLine < bottomMargin)
                    {
                        for (; nextLine + messageFont.Height < bottomMargin &&
                               brokenLinksPrinted < this.brokenLinksView.Items.Count;
                             brokenLinksPrinted++)
                        {
                            printNextLine(g, "link: ", messageFont, leftMargin, nextLine);
                            printNextLine(g, this.brokenLinksView.Items[brokenLinksPrinted].Text,
                                          messageFont, linksMargin, nextLine);
                            nextLine += messageFont.Height;
                            printNextLine(g, "in: ", messageFont, leftMargin, nextLine);
                            printNextLine(g, this.brokenLinksView.Items[brokenLinksPrinted].SubItems[1].Text,
                                          messageFont, linksMargin, nextLine);
                            nextLine += messageFont.Height;
                        }
                    }
                }
                if (nextLine < bottomMargin)
                {
                    if(workingLinks)
                    {
                        if (!workingHeaderPrinted)
                        {
                            printNextLine(g, "working links", headerFont, leftMargin, nextLine);
                            nextLine += headerFont.Height;
                            workingHeaderPrinted = true;
                        }
                        if (nextLine < bottomMargin)
                        {
                            for (; nextLine + messageFont.Height < bottomMargin &&
                                   workingLinksPrinted < this.workingLinksView.Items.Count;
                                 workingLinksPrinted++)
                            {
                                printNextLine(g, "link: ", messageFont, leftMargin, nextLine);
                                printNextLine(g, this.workingLinksView.Items[workingLinksPrinted].Text,
                                              messageFont, linksMargin, nextLine);
                                nextLine += messageFont.Height;
                                printNextLine(g, "in: ", messageFont, leftMargin, nextLine);
                                printNextLine(g, this.workingLinksView.Items[workingLinksPrinted].SubItems[1].Text,
                                              messageFont, linksMargin, nextLine);
                                nextLine += messageFont.Height;
                            }
                        }
                    }
                    if (workingLinks)
                        e.HasMorePages = (workingLinksPrinted < this.workingLinksView.Items.Count);
                    else if (brokenLinks)
                        e.HasMorePages = (brokenLinksPrinted < this.brokenLinksView.Items.Count);
                    else
                        e.HasMorePages = false;
                }
                // If we are done producing output, we need to reset all the counts so that when 
                // someone hits the Print button from inside Print Preview, that we get another 
                // shot at displaying the data.
                if (e.HasMorePages == false)
                    initForPrinting(brokenLinks, workingLinks);
            }
        }
        
        private void initForPrinting(bool brokenLinks, bool workingLinks)
        {
            this.brokenLinks = brokenLinks;
            this.workingLinks = workingLinks;
            titlePrinted = false;
            brokenHeaderPrinted = false;
            workingHeaderPrinted = false;
            brokenLinksPrinted = workingLinksPrinted = 0;
        }

        private void printPreviewToolStripMenuItem_Click(object sender, EventArgs e)
        {
            printPreviewDialog1.Document = printDocument1;
            printDocument1.DefaultPageSettings = pgSettings;
            initForPrinting(true, true);
            printPreviewDialog1.ShowDialog();
        }

        private void maxCoord(ref Point max, Control control)
        {
            max.X = Math.Max(max.X, control.Location.X + control.Size.Width);
            max.Y = Math.Max(max.Y, control.Location.Y + control.Size.Height);
        }

        private void printToolStripMenuItem_Click(object sender, EventArgs e)
        {
            // Keep track of the farthest point down and right within the panel.
            Point max = new Point(0, 0);
            // Bring up a dialog box to ask which links to print.
            Form dlgWebPage = new Form();
            TableLayoutPanel panel = new TableLayoutPanel();
            panel.RowCount = 4;
            panel.ColumnCount = 6;
            Label printChoices = new Label();
            printChoices.Font = new Font("Arial", 12, FontStyle.Bold, System.Drawing.GraphicsUnit.Point);
            printChoices.Text = "Which Links To Print?";
            printChoices.AutoSize = true;
            panel.Controls.Add(printChoices, 1, 0);
            maxCoord(ref max, printChoices);
            Label brokenOnlyLabel = new Label();
            brokenOnlyLabel.Text = "broken links only";
            brokenOnlyLabel.Font = new Font("Arial", 10, FontStyle.Regular, System.Drawing.GraphicsUnit.Point);
            brokenOnlyLabel.AutoSize = true;
            panel.Controls.Add(brokenOnlyLabel, 0, 1);
            CheckBox brokenOnly = new CheckBox();
            panel.Controls.Add(brokenOnly, 1, 1);
            Label workingOnlyLabel = new Label();
            workingOnlyLabel.Text = "working links only";
            workingOnlyLabel.Font = new Font("Arial", 10, FontStyle.Regular, System.Drawing.GraphicsUnit.Point);
            workingOnlyLabel.AutoSize = true;
            panel.Controls.Add(workingOnlyLabel, 2, 1);
            CheckBox workingOnly = new CheckBox();
            panel.Controls.Add(workingOnly, 3, 1);
            maxCoord(ref max, workingOnly);
            Button bOk = new System.Windows.Forms.Button();
            bOk.Text = "OK";
            bOk.Dock = DockStyle.Fill;
            bOk.DialogResult = DialogResult.OK;
            panel.Controls.Add(bOk, 0, 2);
            Button bCancel = new System.Windows.Forms.Button();
            bCancel.DialogResult = DialogResult.Cancel;
            bCancel.Text = "Cancel";
            bCancel.Dock = DockStyle.Fill;
            panel.Controls.Add(bCancel, 2, 2);
            maxCoord(ref max, bCancel);
            this.AcceptButton = bOk;
            brokenOnly.Checked = true;
            workingOnly.Checked = true;
            panel.ClientSize = new System.Drawing.Size(max.X, max.Y);
            dlgWebPage.Controls.Add(panel);
            dlgWebPage.AutoSize = true;
            dlgWebPage.ClientSize = panel.Size;
            if (DialogResult.OK == dlgWebPage.ShowDialog())
            {
                initForPrinting(brokenOnly.Checked, workingOnly.Checked);
                PrintDialog dlg = new PrintDialog();
                printDocument1.DefaultPageSettings = pgSettings;
                dlg.Document = printDocument1;
                if (dlg.ShowDialog() == DialogResult.OK)
                    printDocument1.Print();
            }
        }

        private void printPreviewDialog1_Load(object sender, EventArgs e)
        {

        }

        private void pageSetupToolStripMenuItem_Click(object sender, EventArgs e)
        {
            PageSetupDialog pageSetupDialog = new PageSetupDialog();
            pageSetupDialog.PageSettings = pgSettings;
            pageSetupDialog.AllowOrientation = true;
            pageSetupDialog.AllowMargins = true;
            pageSetupDialog.ShowDialog();
        }

 #if DEBUG
       private void unitTestToolStripMenuItem_Click(object sender, EventArgs e)
        {
           // We only run these unit tests if DEBUG is set non-zero.
           Eat1Test();
           // Test the EatQuotedString function.
           EatQuotedStringTest();
           // Verify that we can open at least guaranteed present web pages.
           validWebPageTest();
           // See if we can read in a web page--perhaps a simple one, and verify the results.
           ReadWebPageTest();
        }

        public void Eat1Test()
        {
            // EatWhiteSpace and Eat1 tests.
            Debug.WriteLine("starting EatWhiteSpace and Eat1 unit tests");
            String whiteSpaceBefore = " death and taxes";
            int index;
            index = Eat1(whiteSpaceBefore, 0, "death");
            if(index != " death ".Length)
                Debug.WriteLine("Eat1(whiteSpaceBefore, 0, \"death\") returned " + index);
            index = EatWhiteSpace(whiteSpaceBefore, 0);
            if (index != 1)
                Debug.WriteLine("EatWhiteSpace(whiteSpaceBefore, 0) returned " + index);
            String noWhiteSpaceBefore = "death and taxes";
            index = Eat1(noWhiteSpaceBefore, 0, "death");
            if (index != "death ".Length)
                Debug.WriteLine("Eat1(noWhiteSpaceBefore, 0, \"death\") returned " + index);
            String multipleWhiteSpacesAfter = "death    and   more taxes";
            index = Eat1(multipleWhiteSpacesAfter, 0, "death");
            if (index != "death    ".Length)
                Debug.WriteLine("Eat1(multipleWhiteSpacesBefore, 0, \"death\") returned " + index);
            String nothingThere = "";
            index = Eat1(nothingThere, 0, "death");
            if (index != 0)
                Debug.WriteLine("Eat1(nothingThere, 0, \"death\") returned " + index);
            index = EatWhiteSpace(nothingThere, 0);
            if (index != 0)
                Debug.WriteLine("EatWhiteSpace(nothingThere, 0) returned " + index);
            String onlyNewLineThere = "\n";
            index = Eat1(onlyNewLineThere, 0, "death");
            if (index != 1)
                Debug.WriteLine("Eat1(onlyNewLineThere, 0, \"death\") returned " + index);
            index = EatWhiteSpace(onlyNewLineThere, 0);
            if (index != 1)
                Debug.WriteLine("EatWhiteSpace(onlyNewLineThere, 0) returned " + index);
            index = 5;
            index = Eat1(onlyNewLineThere, index, "death");
            if (index != -1)
                Debug.WriteLine("Eat1(onlyNewLineThere, 5, \"death\") returned " + index);
            index = EatWhiteSpace(onlyNewLineThere, 5);
            if (index != -1)
                Debug.WriteLine("EatWhiteSpace(onlyNewLineThere, 5) returned " + index);
            Debug.WriteLine("ending EatWhiteSpace and Eat1 unit tests");
        }

        public void EatQuotedStringTest()
        {
            // EatWhiteSpace and Eat1 tests.
            Debug.WriteLine("starting EatQuotedString unit test");
            String whiteSpaceBefore = " death and taxes";
            String output = "";
            int index;
            // See if it works with something that doesn't have a quoted string.
            index = EatQuotedString(whiteSpaceBefore, 0, ref output);
            if (index != whiteSpaceBefore.Length)
                Debug.WriteLine("EatQuotedString(whiteSpaceBefore, 0, ref output) returned " + 
                                index + "; expected " + whiteSpaceBefore.Length);
            String quotedString = "\"The easy case\"";
            index = EatQuotedString(quotedString, 0, ref output);
            if (index != quotedString.Length - 1)
                Debug.WriteLine("EatQuotedString(quotedString, 0, ref output) returned " +
                                index + "; expected " + (quotedString.Length - 1));
            else
            {
                if (!output.Equals("The easy case"))
                    Debug.WriteLine("EatQuotedString(quotedString, 0, ref output) returned " +
                                    output + "; expected " + quotedString);
            }
            String whiteSpaceBeforeQuotedString = "   \"The almost as easy case\"";
            index = EatQuotedString(whiteSpaceBeforeQuotedString, 0, ref output);
            if (index != whiteSpaceBeforeQuotedString.Length - 1)
                Debug.WriteLine("EatQuotedString(whiteSpaceBeforeQuotedString, 0, ref output) returned " + 
                                index + " " + output + "; expected " + " 3 " + 
                                " \"The almost as easy case\"");
            String halfQuotedString = "  \"no closing quote";
            index = EatQuotedString(halfQuotedString, 0, ref output);
            if (index != halfQuotedString.Length)
                Debug.WriteLine("EatQuotedString(halfQuotedString, 0, ref output) returned " +
                                index + "; expected " + (halfQuotedString.Length - 1));
            else
            {
                if (!output.Equals("no closing quote"))
                    Debug.WriteLine("EatQuotedString(halfQuotedString, 0, ref output) returned " +
                                    output + "; expected \"no closing quote");
            }
            Debug.WriteLine("ending EatQuotedString unit test");
        }

        public void validWebPageTest()
        {
            Debug.WriteLine("starting validWebPageTest");
            String url = "http://www.google.com";
            if (!validWebPage(url))
                Debug.WriteLine("validWebPage(\"" + url + "\") returned false");
            else
            {
                // Make sure that the url was added to the webPageResponds Vector.
                if (!webPageResponds.Contains(url))
                    Debug.WriteLine("validWebPage(\"" + url + "\") did not add " + url + " to webPageResponds member");
            }
            url = "http:";
            // We aren't really checking anything here but the exception processing for an invalid URL.
            validWebPage(url);
            // Make sure that we added this url to webPageResponds--there's no point in checking a url twice, 
            // even an invalid one.
            if (!webPageResponds.Contains(url))
                Debug.WriteLine("validWebPage(\"" + url + "\") did not add " + url + " to webPageResponds member");
            Debug.WriteLine("ending validWebPageTest");
        }

        // Much of this is shared with validWebPageTest, but that's because both of them open URLs.
        public void ReadWebPageTest()
        {
            Debug.WriteLine("starting ReadWebPageTest");
            String url = "http://www.google.com";
            String webPage;
            if (null == (webPage = ReadWebPage(url)))
                Debug.WriteLine("readWebPage(\"" + url + "\") returned an empty web page");
            url = "http://www.claytoncramer.com/dummytest.html";
            if (null == (webPage = ReadWebPage(url)))
                Debug.WriteLine("readWebPage(\"" + url + "\") returned an empty web page");
            else
            {
                // Make sure that we got back what we expected. 
                String expected = "\r\n";
                if (!webPage.Equals(expected))
                    Debug.WriteLine("ReadWebPage(\"" + url + "\") returned " + webPage.ToString() + "; expected " + expected);
            }
            url = "http:";
            if (null != (webPage = ReadWebPage(url)))
                Debug.WriteLine("readWebPage(\"" + url + "\") returned a web page");
            Debug.WriteLine("ending ReadWebPageTest");
        }
#endif

    }
 
    public class HtmlNode
    {
        public bool html;
        public string data;
        public string parameter;

        public HtmlNode()
        {
        }

        public HtmlNode(bool html, string data)
        {
            this.html = html;
            this.data = data;
        }
    }

}