# Robots Crawler file. # ==================== # # This file is accessed by all robots, not just Google. # It will list the Sitemap URL - a list of all pages to be indexed. # # This file will list all pages NOT to be indexed. # This file is an 'exclusion' list. # # The exclusion list is only a set of requests - not orders. # Similar to 'Private - Keep Out' on an unlocked door. # Nasty robots will ignore all such requests. # # # This file does not support an 'Allow' statement. # It does support a 'Sitemap' statement, that acts as an 'include' file. # ie Sitemap: http://www.pastpages.co.uk/crawler.txt # This is an indicator to robots to examine all URLs in that file. # # # Types of Site Files: # # sitemap.xml Used by Google, and maybe a few others (submitted) # sitemap.xml.gz Used by Google, and maybe a few others # urllist.txt Used by Yahoo and a few others. (submitted) # ror.xml # crawler.txt Include file linked from this file (robots.txt) # crawler.htm Site file linked from index.htm (Past Pages start) # # # This file is used by automatic site file generators. # They normally include every file they come across, unless told otherwise. # Robots.txt should specify all files and folders that should be ignored. # It may take several passes to refine the exclusions. # # # # # End of General Notes and Comments # =========================================================================== # Notes on wildcards. # #Disallow: /*Kh this Disallows URLs starting with 'Kh' #Disallow: /*Kh* this Disallows 'Kh' anywhere in URL string #Disallow: /Kh* this Disallows URLs ending with 'Kh' # This will stop all the small image files from being indexed. # Robots does not seem to support the '?' as a wildcard as this character # is used extensively in search queries and is used to suppress search URLs. # #Disallow: /*.jpg$ this Disallows jpg files. The '$' anchors the match # to the very end of the string. # # Note that some robots are case sensitive so .jpg does not cover .JGP # # ========================================================================= # The sitemap listed below holds all useful pages on the site, # and is a kind of 'Include' file. Sitemap: http://www.pastpages.co.uk/crawler.txt # ............................................. # The lines below are all exclusion statements # They eliminate the files that # (a) should not be included in any automated Site File - very important # (b) should not be viewed by a robot - not so important User-agent: * Disallow: /*.css$ Disallow: /*.asp$ Disallow: /*.jpg$ Disallow: /*.gif$ Disallow: /refresh* Disallow: /site-gifs/ Disallow: /site-files/ephemera/ Disallow: /site-files/maps-africa/ Disallow: /site-files/maps-america-central/ Disallow: /site-files/Maps-america-north/ Disallow: /site-files/maps-america-south/ Disallow: /site-files/maps-asia/ Disallow: /site-files/maps-australia/ Disallow: /site-files/maps-europe/ Disallow: /site-files/maps-london/ Disallow: /site-files/maps-military/ Disallow: /site-files/maps-towns/ Disallow: /site-files/maps-uk/ Disallow: /site-files/maps-road/.jpg$ Disallow: /site-files/maps-road/pages/ Disallow: /site-files/maps-road/infopages/roadmaps/ Disallow: /site-files/maps-road/infopages/*.jpg$ Disallow: /site-files/maps-road/infopages/index.htm #Google crawl claims 404 not found on this non-existant file. Disallow: /site-files/maps-world,polar,misc/ Disallow: /site-files/prints-africa/ Disallow: /site-files/prints-america/ Disallow: /site-files/prints-anatomy/ Disallow: /site-files/prints-asia/ Disallow: /site-files/prints-australia/ Disallow: /site-files/prints-europe/ Disallow: /site-files/prints-history/ Disallow: /site-files/prints-illustrated/ Disallow: /site-files/prints-industry/ Disallow: /site-files/prints-military/ Disallow: /site-files/prints-misc/ Disallow: /site-files/prints-nature/ Disallow: /site-files/prints-peoples/ Disallow: /site-files/prints-science/ Disallow: /site-files/prints-uk/ Disallow: /site-files/prints-uk-london/ Disallow: /site-files/specials/ Disallow: /site-files/*.asp$ Disallow: /site-files/*.gif$ Disallow: /site-files/*.jpg$ Disallow: /site-files/*.txt$ Disallow: /site-files/*.php$ Disallow: /site-files/No_Frames.htm Disallow: /site-files/PaymentView.htm Disallow: /site-files/re-load.htm Disallow: /site-files/refresh.htm Disallow: /site-files/StyleDemo.htm Disallow: /site-files/ThankYou.htm Disallow: /site-files/empty_page.htm Disallow: /site-files/form-pp-done.htm Disallow: /site-files/form-pp-orders.htm Disallow: /site-files/form-pp-enquiry.htm Disallow: /site-files/form-pp-search.htm Disallow: /site-files/PurchasePage.htm Disallow: /Area51/ Disallow: /33/ Disallow: /BS/ Disallow: /Shop/ Disallow: /Patio/ Disallow: /Dev/ Disallow: /Forget-Me-Not/ Disallow: /View*/