![]() |
Past Pages |
|
You
may have reached this site via a search engine such as Google or Yahoo.
They are very useful tools and the internet would be a less useful
environment without them. This simple article helps to
explain a few of its features and a few of its search problems. Most people imagine the search engine to work something like this: 1 Your type in a search description. 2 The search engine searches the internet for matches. 3 The search engine returns a list of sites matching your description. That is the very simplistic view of the process. Steps 1 and 3 are quite accurate. Step 2 is the messy and complicated bit. The search engine does not go out and search the internet. It searches its own database of what it thinks is out there on the internet. Google, Yahoo and the rest of them have programs called 'robot's, 'bots' or 'spiders' that wander around the internet examining the content of web sites already on their databases Lets start with a site called 'www.pastpages.co.uk' that is already on the Google database. A spider crawls over the web site looking for any changes or any links to other web sites. If it finds a link it may log the name for future examination. It may not. If it does make a note and come back it will examine the newly found web site and decide to log it's details on the Google database. It may decide that the site is rubbish and not bother. The saved details will be pages of the web site saved locally in a 'Cache'. 'Cached' web pages are copies of a web page, not on the native web site, but in the Google database. If the pages at 'www.pastpages.co.uk' are updated every day and the Google caching of the web site is every 6 weeks there is good chance of a search NOT showing the required or correct results. Any search will only indicate what the search engine has recorded. There may be thousands of web sites containing your results, but they may not be known to the search engines. Any new web site has the uphill battle of getting recognised by search engines. Submissions must be made to all the large ones - sometimes multiple submissions. Even when a site is established, new pages are generated for fresh developments. The search engines have to be nudged to go and look at the new pages. They may do it, they may not. They will certainly take their time and do it at their convenience. Owners of web sites have no control over how and when their web pages will appear in search results. The more popular the site the more regular the visitations and updates. Sites like www.bbc.co.uk will be trawled by the search engines all the time. A village shoe shop may have its site visited once a year - in a good year! Smaller search facilities use the databases of the larger ones, and even the larger ones search each other's results. There are only a few true search engines, the rest are parasites feeding from the main deep troughs of Google, Yahoo and MSN. All this goes to illustrate that search engines do not necessarily have the latest findings. A web page may be updated, the page may be removed from the web site or sometimes the whole web site may have gone. Search engines are not good at removing references to old or dead web pages. What comes out of the search results is critical on what is entered as a descriptive text. A very vague and woolly description will result in thousands of possible matches. Refining the description will improve the quality of the results. Ultra refinement of the description may result in no matching finds. Juggling the priority of search words will bring different results. A fixed text string in quotes will reduce search results, such as "Whistle Down the Wind" will eliminate most references to whistling or weather. An exact match must be made of everything within the double quotes. Try the negative sign to reject certain associations, such as "Treasure Island" -Stevenson to eliminate the novel (or references to R.L. Stevenson) if you are trying to find a pub of that name. Better still would be "Treasure Island" pub -Stevenson Note that this will fail if the pub's web site makes a reference to R.L. Stevenson There are many tricks and tools like this for fine tuning web searches. In summary - There is much more data out there on the internet than the search engines report. Search engines report on their saved data not on the current internet. Search engines are slow and selective in what they save for reporting. What they do report is generally history and not necessarily current. What they do report may no longer exist. Google is not God. |
The image below shows the result of a Google search on 'Antique Maps' at the Past Pages result.![]() Note the 'Cached' text, indicating that the result is from the Google archive, not the Past Pages web site. Such a vague input description unveils thousands of results and as such means that the result points to the home page of the web site. |
| The image below shows the result of a very specific Google search on 'Antique Print Anatomy Drake', hoping to unearth something on 1707 James Drake medical prints. ![]() Note the 'Cached' text, indicating that the result is from the Google archive, not the Past Pages web site. The result points to a specific page of the web site - 'Page18-Prints-Anatomy.htm'. Clicking the 'Cached' text will take you to Google's stored version. Clicking any of the Past Pages hyperlinks will take you to the genuine article at Past Pages. Since that data was cached the Past Pages anatomy section has expanded and there are now 3 pages of anatomy and medical prints. The references to Drake's Anatomy are now on web page 'Page18-Prints-Anatomy3.htm' The Google direction to web page 'Page18-Prints-Anatomy.htm' will not find the Drake prints. The original page still exists, but the Drake material has been moved to page 3. Anticipating such problems there are links to the new pages ( 2 and 3 ), from the original page. The Google misdirection error will not be rectified until the Google 'spider' comes back to the site and makes a fresh cache of the anatomy web pages. This is a variable feast and very unpredictable. Fresh entries on these pages will not show up on Google searches until the cache update. Google may be out of date. |