Netcraft has developed a technique for identifying the number of computers (rather than IP addresses) acting as web servers on the internet, and attributes these computers to hosting locations through reverse DNS lookups.
This provides an independent view with a consistent methodology worldwide on the numbers of web servers, the rate of growth over time, and the operating systems and web server technology used at each hosting company worldwide.
The dataset is useful to hosting companies for competitive analysis, mergers and acquisitions, identifying international markets for organic expansion, and also to any organisation selling to the hosting industry.
It is presented as an Excel spreadsheet where the user may pivot by country, operating system, hosting model [dedicated, shared, bulk/domain registry as calculated from the ratio of sites to servers], and drill down to inspect absolute numbers, rate of growth and technology deployed at individual companies.
One of the common observations made about the Web Server Survey is that it counts hostnames rather than physical computers, and so is not a suitable metric to indicate the number of servers a hosting provider has. Technically sophisticated hosting companies can run thousands of sites on a single computer, and the great majority of the world’s web sites are located in hosting datacenters rather than on peripheral networks.
By arranging for a number of IP addresses to send packets to us near simultaneously, low level TCP/IP characteristics can be used to work out, within an error margin, if those packets originate from the same computer, by checking for similarities in a number of TCP/IP protocol header fields. To build up sufficient certainty that IP addresses on the same computer have been identified, many visits to the sites in the Web Server Survey are necessary, which takes place during a period of over a month.
- Only sites found by the Web Server Survey will be included. The number of hosts found worldwide running internet web sites by the Web Server Survey is large [over 200 million in February 2009], but is not exhaustive.
- Attributing a site to a hosting company requires that the hoster provides a reverse DNS server for the network. In some cases where no reverse DNS server is configured, the hoster for a site will be shown as unknown.
- Backend machines such as database servers not running web sites will not be counted, as they are unseen from the internet.
- At most one computer will be counted for each site. Round robin DNS, reverse web proxies, load balancing products like Cisco Local Director and BIG-IP and some connection level firewalls hide multiple web servers behind a single hostname. Additionally with some of these products the operating system detected is that of the “front” device rather than the web server behind.
There are some other factors that create errors in our technique, the main ones being:
- Despite making multiple visits, there is still a low probability that two computers will be considered the same by chance similarities in low-level TCP/IP protocol header fields.
- Some IP addresses do not respond on enough visits for the technique to be applied. This is mainly due to computers or networks being down or badly overloaded on several of the visits, in which case there are uncounted computers.
- If a system changes or upgrades operating system during the course of the survey, which takes over a month to run, a computer may be counted more than once; this leads to over-counting, but generally this should be a small effect.
- Virtualisation technologies will yield different results in the survey depending on their implementation. For example:
- Solutions that use the same IP address for multiple virtual servers (e.g. VMware NAT networking) will count as one computer.
- Solutions that use a single operating system kernel and TCP/IP stack for multiple virtual user-lands (e.g. user-mode Linux, FreeBSD jails) will count as one computer.
- Solutions that have different operating system kernels handling different IP addresses (e.g. VMware in bridged mode) will count as separate computers.
How does it pan out in practice?
The survey does not attempt to count back-end servers (application or database servers) or servers other than web (HTTP) servers. If one wants to use the survey as a way of measuring the total number of servers that a company has supporting its web presence, the survey would produce a lower number because it counts only the web servers. Informal checking with two of the largest dedicated server companies suggests that the survey counts about 2⁄3 – 3⁄4 of the machines in their datacenters.
The technique is likely to be at its most useful in comparisons of companies operating shared or mixed shared & dedicated business models, where comparisons of numbers of IP addresses may lead to misleading results. Some US companies [including The Planet, Go Daddy, & NTT/Verio] have over 100,000 IP addresses each supporting shared hosting. Our technique brings the number of servers at those locations into the right order of magnitude.
Worldwide, IP address allocations are very uneven with the US having a much more generous allocation of IP addresses than non-US countries. The server counting technique reduces the geographical skew of an IP address based view of the internet, and whereas 15 of the top 20 hosting locations are US companies when counting by IP address, when using the server counting technique this changes to 11 of the top 20. This makes comparison of companies operating in different geographical locations possible, and is a useful precursor to international expansion or acquisitions.
Netcraft has been performing this survey since February 1999. The trends since then have been very smooth indicating there is only a small amount of “random error” in this analysis.
The dataset is updated on a monthly basis and is available on a company license basis.
Please contact us (email@example.com) for further information of costs.