Search Engines | Transalpine Internet Sevices

Transalpine has a department specialised in the highly technical field of website content optimization. We explain here how Content Engineering can improve the search engine performance of your website.

Return to Index

Search Engines

Search engines have been a service provided by private companies since the inception of the World Wide Web in 1992. Their intent is to provide access to FTP sites via a central system which collates and organises the vast amounts of data involved in categorising and evaluating sites according to the criteria of relevance to a searcher's query.

The earliest web crawlers, such as Archie (Alan Emtage, 1990), relied on directories related to computer files in a specific computer network, and Gopher (Mark McCahill, 1991), used a hypertext paradigm to search for plain text references in files. By 1993, as the World Wide Web grew to many web servers, search engines like Mosaic (1993), and Wandex (Matthew Gray, 1993) improved the natural language keyword capabilities of search engines, and began to crawl the web to catalogue indexed pages on the web. Gradually, the content that was indexed extended to the full text on a web page.

With the need for a more effective search engine, many start-ups appeared between 1993 and 1998. These included: Excite (1993), Yahoo! (1994), WebCrawler (1994), Lycos (1994), Infoseek (1994), AltaVista (1995), Inktomi (1996), and AskJeeves (1997 - now Ask).

Google

Today, the world's most popular search engine is Google. Google was a game-changer in the way search engines functioned. Previous search engines were unable to guarantee sufficiently well that a searcher's intent was being matched by the way sites were indexed. Words were not enough. The classic example is 'Jaguar'. The engines could not distinguish between the car make and the animal, or any other meaning a word or group of words might have. So both were presented, and it was up to the searcher to determine the relevancy of each site offered.

In 1998, Larry Page and Sergey Brin launched their search engine, Google, which worked by a novel system of relevancy ranking. The initial system was unsophisticated: it merely counted the occurrence of search terms within a page. Soon they added a rough assessment of a website's ranking among the internet community based on the number of links from other sites (backlinks) to that site. This is the PageRank™ link analysis algorithm, which Google uses to assign numerical weighting to linked documents.

One of the restricting factors in the operation of search engines is energy consumption. Being predominantly American, their foremost consideration in this regard is cost, rather than environmental. The total global electrical power consumption of search engine servers is of the order of several gigawatt (GW), providing potentially thousands of petaflops of processing power.

Google alone would require more than a quarter of a nuclear power station to operate its servers (ref: MIT article: What it takes to power Google). All of the data centres of the world consume 1.3% of the world's electricity production.

Search Engine Usage

Search engine popularity varies world region by region. Here is an overview:

EngineUSAUKGermanyItalyRussiaChina
Google63%90%89.7%95%34.5%32.9%
Yahoo!20.6%2.8%2.4%1.9%
bing8.2%2.1%2.3%0.6%
Ask1.6%0.66%
Yandex62%
Baidu51.5%
Others8.2%3.5%4.3%1.84%3.5%15.6%

Global distribution puts Google way ahead (August 2014):

GoogleBaiduYahoo!BingAOLAskOther
67.63%18.26%6.07%5.35%0.19%0.11%2.34%
Crawling and Indexing

Search engines can return many hundreds of thousands of results for a query in a fraction of a second. How do they do it? Are these real results, or simply a statistical estimate? And what is the point, if users typically look at not only just the first page of search results (SERP 1), but only the first 3 or 4 of these results?! (When did anyone mention to you '... I was about to become disheartened, but then, at result number 512,325, I found just what I was looking for...'?)

The huge number of results is not the intention of the search engine, rather a consequence of, and insight into, how the search engine works when it crawls and indexes the World Wide Web.

The search engines utilise a programme called a 'spider', which makes its way through the web via a series of strands and nodes. The strands are pages on a domain, and the nodes are hyperlinks to other domains. If there are many return hyperlinks (AKA backlinks) to the first domain, the spider will find itself visiting that domain more often than a site stuck out on its own.

Ranking

The spider collects data from its journey along the hyperlink highway, and ranks sites according to the perceived authority other sites grant them. The many elements which contribute to the complex and continuously-evolving algorithms used by search engines to rank pages are known as ranking factors or algorithmic ranking criteria. Google prefers the term signals.

Relevance

The search engines attempt to assess the many millions of sites containing key word matches to the query in terms of relevance. This is done by assessing the intent of the query, and the number of matching occurrences of the keywords, as well as the context they appear in on a page.

If a key word appears as the page title, or in a heading tag (h1, h2 ...), that page is ranked higher.

Importance

Importance can also be thought of as popularity. The more references (hyperlinks) there are from other sites to a site, and the more authoritative those referring sites are perceived to be, the higher the importance ranking the search gives a site.

Greater authority is gained if the context of the referring site matches the content of the target site. A blog dedicated to SEO issues will have more importance than a site with, say, an ad unrelated to the general content of the page it appears on.

Search operators

Search engines offer a number of systems for refining search queries. These include:

  • [jacket shops -fur]
  • excludes the term 'fur' from search returns about 'jacket shops'. For obvious reasons, 'fur shops' are of no interest.

  • [shops +jacket]
  • the word 'jacket' must appear in the search query results. This can be used to specify the inclusion of words that may normally be excluded as unimportant, such as the definite article, the. It is also useful for disambiguation. Andrew Bone +Science library.info -painter will ensure results about Andrew Bone the Sciencelibrary.info editor, and exclude results for the very fine painter by the same name.

  • ["science library"]
  • ensures the exact phrase 'science library' is searched for, a ¨somewhat smaller set than the billions of billions 'science' and 'library' individually would return

  • [quantum OR quanta]
  • pages which contain references to at least one of the two keywords

  • [site:www.sciencelibrary.info]
  • the search will be conducted only in the nominated domain

    [site:www.google.com] : this will give an indication of the number of domains indexed by Google

    [site:info] : limits the search to the nominated TLD (top-level domain)

    [site:zumguy.com -www.zumguy.com] : includes all sub-domains

  • [inurl:science]
  • ensures the word 'science' is in the url

    [allinurl:science library] : ensures both words appear in the url

  • [intitle:science]
  • ensures the word 'science' is in the page title

    [allintitle:science library] : ensures both words appear in the page title

  • [inanchor:science]
  • ensures the word 'science' is in the anchor text (the text used to refer to the page in a backlink)

    [allinanchor:science library] : ensures both words appear in the anchor text

  • [intext:Einstein]
  • ensures the word 'Einstein' is in the body text of a page

  • [ext:php] & [filetype:php]
  • either of these restricts the search to pages with the .php file extension

  • ["seo*compliance"]
  • will return any phrase which has 'seo' and 'compliance' with any word in between. e.g. 'seo design compliance' and 'seo structure compliance' are both returned

  • [Related:www.sciencelibrary.info]
  • returns related pages. If a site links to the page, this query returns other sites linked to by that site

  • [Info:www.sciencelibrary.info]
  • provides information about the page, such as the page title, description, related pages, and incoming links

  • [Cache:www.sciencelibrary.info]
  • shows the version of the page last time Google crawled it

    This information is imported from our site www.sciencelibrary.info. For further information, please visit the site, or contact us.

Transalpine traduzioni

Quote of the day...

De calcaria in carbonarium (out of the frying pan into the fire)

ZumGuy Internet Promotions

Vitruvian Boy