Internet Error Prevention and Detection: How Dysfunctional Are the Fortune 100 Web Sites?
November 08, 1999
Today, all companies are going dot.com.
Because the Internet is an imperative in the new digitally connected economy, companies are scrambling to Web-enable their businesses, or at least build some sort of Web presence. No viable company can afford to do otherwise. Consider:
- The U.S. Internet economy totaled $301 billion in 1998, and created jobs for 1.2 million workers (Source: University of Texas and Cisco Systems, reported in NUA Internet Surveys, 1999)
- Over 90 percent of top managers cite the Internet as a major force affecting the future global marketplace (Source: Business Week Online, July 12, 1999)
- Some 10,000 new Web sites are launched daily (Source: Business Week, June 21, 1999)
- Business-to-business e-commerce, worth $43 billion in 1998, is likely to exceed $1.3 trillion by 2003 (Source: The Economist, June 26, 1999)
To be sure, the Internet is redefining corporations worldwide-but to what degree the Internet reshapes global business will be determined, in large part, by how successfully corporations pull-off the transition to the Web.
To date, however, corporate Web sites haven't fared all that well.
Despite spending anywhere from $5 million to $20 million to establish a truly differentiated Web presence (Source: Gartner Group Report, 1999)-many corporate sites are riddled with errors.
Exactly How Dysfunctional Are Fortune 100 Web Sites?
Fortune 100 Web sites are surprisingly dysfunctional, according to a recent Web Audit conducted by Parasoft Corporation, a leading provider of advanced error prevention and detection solutions.
In June of 1999, Parasoft utilized its proprietary error detection technology, SiteRuler, to scan 95 of the Fortune 100 Web sites for coding errors and other functional and aesthetic glitches. Parasoft's Web Audit examined the Fortune 100's Web sites using the following criteria:
- link errors
- HTML coding errors: according to Word Wide Web Consortium standards
- technology usage
General Audit Statistics
Of the 95 Fortune 100 corporate Web sites audited, Parasoft found that these Web sites include:
- 717,653 total files
- 57,114 total directories
- 292,357 total HTML pages
- 307,254 total images
- 95 sites using HTML
- 79 sites using PDF files
- 36 sites using audio or video files
- 34 sites using Java
- 28 sites using CGI
- 24 sites require a password
- 8 sites using ShockWave
Link Error Statistics
The Parasoft analysis shows that the Fortune 100 Web sites reviewed have 292,357 pages and 84,302 link errors-that means almost 29 percent of the pages contain link errors, or one link error every three and a half pages.
Link errors, or navigational errors like a missing page, bad anchor, or malformed URL, are the most damaging errors a Web site can have. These errors lead to viewer frustration, lack of communication, and ultimately, customers fleeing to other Internet sources.
While the average number of link errors raised eyebrows among researchers, there were corporate Web sites that deserved recognition for their complete lack of link errors.
HTML Error Statistics
HTML coding errors also figured prominently in Parasoft's corporate Web Audit. These errors refer to HTML flaws ranging from incorrect layout and form functionality to browser incompatibility causing a page to be viewed incorrectly. Although these errors
are not always critical-and the severity of the error may differ from a subheading not being italicized to the browser crashing completely-they can communicate carelessness or lack of technical ability to visitors, such as customers. And, there are lots of HTML coding errors. In the 292,357 Web pages, there were 3,683,974 HTML coding errors. This means that on average there are over a dozen HTML coding errors per page on corporate Web sites.
Furthermore, unlike the link error research, which found some sites had no link errors, all of the corporate Web sites reviewed had HTML coding errors.
Technology usage refers to the type of software technology included in these corporate Web sites. The technology usage analysis did not involve identifying errors; it was included to gauge the technical complexity of the Web sites analyzed. Surprisingly, the average level of complexity was particularly low, with a majority of the Web sites examined using just HTML and PDF (Acrobat) technology.
Only 57 of the 95 total sites reviewed contained dynamic content including ASP, CGI, JSP, PHP, PL, CSS, and SHTML files.
Why Do Web Site Errors and Their Prevention Matter?
Web site mistakes matter because they can cost companies productivity, customers, and profits. For example, $58 million per month in e-commerce sales are lost due to Web page loading failures (Source: Zona Research, "The Need for Speed" report). This begs the question, are those loading errors due to today's existing link errors?
Consider the possible consequences. For example, IBM customers will reportedly visit the corporate giant's Web page 28 million times in 1999 for e-service and support. How many customers does IBM stand to lose if customers repeatedly encounter missing or broken links, or other similar errors?
Put another way: How many clicks does it take to lose a customer?
Typical Web users won't put up with faulty links, or other time-consuming and frustrating Web site errors. With 10,000 new Web sites sprouting up daily, it's too easy to click somewhere else. Besides, no global company today can afford to risk losing customers because of Web site errors-especially when it costs up to $200 to cultivate each new on-line customer (Source: Wall Street Journal, July 12, 1999). And, in addition to jeopardizing the aforementioned assets, corporations risk losing competitive advantage, brand equity, and trust.
What's more, Web site errors reflect on the company itself. Corporate agendas and values are directly conveyed via corporate Web sites. Problems, glitches, and faults, are transparent-they are broadcast to the world-24x7 over the global communication network. Corporations can no longer hide operational errors behind concrete walls and mirrored windows, telephone lines, and glossy brochures.
If a Web site isn't fully operational, companies can't connect and communicate with their customers, partners, shareholders, or employees-the primary targets of corporate Web sites. Fully one-quarter of corporate Web sites fail to provide more than half the content that key external audiences want and need (Source: Shelley Taylor and Associates).
Analysis by Fortune 100 Industry/Category
So which industries are doing things "right?"
The Parasoft researchers performed an industry analysis of their Web site review findings. Analysis by industry includes 83 of the Fortune 100 corporate Web sites. Twelve of the original 95 sites scanned and analyzed were single companies in an individual industry category and were not included in industry average calculations.
The industries reviewed and the average number of Web pages per industry is as follows:
Link Errors Per Page by Industry
Parasoft found that Motor Vehicles and Parts (0.65 errors per page), Pharmaceuticals (0.57 errors per page), and Diversified Financials (0.57 errors per page) were among the worst industries in terms of average number of link errors per page. Additionally, Wholesalers (0.07), Beverages (0.04), and Mail/Package and Freight Delivery (0) fared the best.
HTML Coding Errors Per Page by Industry
For HTML coding errors per page by industry, Mail/Package & Freight Delivery, Petroleum Refining, and Aerospace fared the best, while General Merchandisers, Computers and Office Equipment, and Utilities ranked the worst.
Interestingly, some of the biggest Web site error offenders are those industries that are racking up the largest Internet spending bills-and those that are relying on accurate reporting data. These include Diversified Financials, which have over half an error per page, and spend up to $16.6 billion on the Internet.
Best Industries: Mail/Package & Freight Delivery
Mail/Package & Freight Delivery, Beverages, and Wholesalers fared the best in the link errors per page category. Similarly, Mail/Package & Freight Delivery, Aerospace, and Petroleum Refining did best in HTML error rankings.
Internet Challenged Industries: A Difficult Call
At the other end of the spectrum, the Web sites showing the largest numbers of link errors per page were Motor Vehicle & Parts, Pharmaceuticals, and Diversified Financials, while General Merchandisers, Computers and Office Equipment, and Utilities showed the most HTML coding errors.
These link errors among Motor Vehicle & Parts, Pharmaceuticals, and Diversified Financials leaders range from over 600 unrecognizable URLs, including multiple slashes, incorrect slashes, and new line characters. Most importantly, over 5,000 link errors were due to bad anchors and missing pages, the most deadly of Web site errors.
HTML coding errors for General Merchandisers, Computers and Office Equipment, and Utilities, meanwhile, accounted for 2,085,889 errors in the 106,423 HTML pages of the 13 sites in these three categories, an average of 19.59 error per page.
Are Technology Companies Any Better?
Of the Fortune 100, five corporations fall into the Computers and Office Equipment category. All combined, these companies have including over 102,000 HTML pages, and over 35,000 link errors and 2,008,843 HTML coding errors. That's an industry average of one link error every 2.9 pages and over 19 HTML error per page-and these are high-tech companies!
By comparison, the top five General Merchandise companies, have a significant reduction in quantity of on-line materials, and a significant reduction in error ratios. Total Internet site files for the five General Merchandise companies scanned include over 2,600 HTML pages, and only 399 link errors. That's an industry average of one link error every 6.5 pages.
The Error Paradox
In preparing this report, Parasoft discovered an interesting paradox there deserved to be mentioned. While initial review showed a correlation between number of Web site pages and the number of link errors; that correlation doesn't apply when examining link errors per page. As the chart below vividly shows, the number of link errors per page does not correlate to a Web site's size (i.e. number of pages.)
The total number of pages to total number of errors curve is not surprising, considering the correlation between lines of code and errors. DeMarco and Lister, authors of the ground-breaking work Peopleware: Productive Projects and Teams, estimate that the incidence of error in software development today is estimated to be 1.2 errors for every 200 lines of code. So, yes, more pages, more possibility of errors. But the errors per page showed the major difference in industry Web site functionality.
Fortune 100 Web sites are surprisingly flawed. With the world's top 35 companies having a Web site link error every 3.5 pages and over 12 HTML coding errors per page, these global corporations are unnecessarily threatening the return on their Internet investment. Considering the significant amount of Web site errors, and the fact that global enterprises are spending up to $20 million to build an e-commerce ready site (Source: Gartner Group)-top corporations are thwarting their own on-line objectives: to provide product and service information; increase brand awareness; and improve corporate image.
Link and HTML coding errors are avoidable. In most cases, these are simple or sloppy coding errors that are easy and inexpensive to fix. In fact, the cost of neglecting errors is much higher than repairs themselves. Remember: it costs up to $200 to recruit each on-line customer; and faulty Web site links can swiftly sever any pre-existing loyalty. Competitors know potential customers are only a click away.
In addition, some 40 percent to 50 percent of corporate America's IT budget is spent on fixing software defects. (Source: Ernst & Young's Center for Information Technology and Strategy.) Corporations need to know what they spend on detecting, repairing, and preventing Web site flaws. Delays in production significantly increase development costs. What's more, costs in the form of operational disruptions, service outages, and product failures, are then passed on to customers using the software.
Web site error detection is a key business issue that has been largely ignored by the mainstream media and corporations alike-in part, because they don't realize how pervasive and damaging Web site errors are. This report aims to highlight the importance of error detection. Identifying and preventing errors is crucial to not only effective, Web-based e-commerce, but also to survival and sustained success in the new Internet economy.
Parasoft's fast and easy solutions like SiteRuler help customers automatically detect and prevent bugs-without requiring any complex programming skills. By streamlining the software development process, Parasoft's error detection tools help ensure that quality and cost are controlled as a project moves from design to release.
Between June 8, 1999 and June 25, 1999, Parasoft used its Web site management tool SiteRuler 1.1 to test 95 of the Fortune 100 companies' Web sites. Parasoft attempted to test all 100 sites, but only tested 95 due to:
- security issues such as passwords, encryption, and permission to test.
- sites that changed so often and dramatically that Parasoft was unable to confirm the validity of the test results for these sites.
- one or more of these companies not having a Web site.
Each Web site was loaded into SiteRuler from the site under test's Web server: this means that only files accessible by links from that site's "home page" were loaded and tested.
After each site was loaded into SiteRuler, three types of analyses were performed:
The link test was performed by SiteRuler's Link Check feature. By default, SiteRuler is configured to test only internal links (links within the current site); this configuration was maintained when these link tests were conducted. Types of link errors found include missing files, missing anchors, duplicate anchors, and malformed URLs. A link was considered erroneous if that link would not lead to the location specified by the link statement in the HTML code. Thus, link errors include valid link statements that lead to invalid pages (for example, a link to a file that does not exist) as well as invalid link statements that do not lead to the specified location because browsers do not recognize the invalid statements as links (for example, <A HREF="index.htm">).
The HTML test was performed by SiteRuler's CodeWizard for HTML feature. This feature parses the code under test and reports violations when the code does not comply with one of the enabled CodeWizard rules. CodeWizard rules are based on HTML coding standards promoted by the World Wide Web Consortium (W3C). When performing these tests, CodeWizard for HTML was configured to enforce its default set of rules.
Technology Usage Survey:
The technology usage survey was performed by SiteRuler's Statistics feature. This feature determines how many files on the Web site under test use specified technologies. A file is declared to use a specific technology if its file extension matches one of the accepted file extensions for that type of technology. For example, SiteRuler determines the number of PDF files on a Web site by looking at the number of files with a ".pdf" extension.
After each test was completed, Parasoft confirmed that all reported errors were indeed valid errors.
SiteRuler technology is not intended for use on Java and asp intensive sites. It is also not intended to load and test sites via a Web server. When sites are loaded via a Web server (as opposed to via an ftp connection or from a local directory), all files that are publicly accessible may not be loaded, and thus may not be tested. Because Parasoft loaded the sites under test via a Web server, they may not have loaded all of each site's files. This means that there is a possibility that the number of errors reported is less than the actual number of errors on each site at the time of the test. Similarly, there is a possibility that the survey of technologies used may not have accounted for all of the technologies used on each site.
# # #
Parasoft is a registered trademark of Parasoft. All other brands are trademarks or registered trademarks of their respective holders.