The standard of the website standardizes construction and solution of Robots agreement detailed

One, follow W3C standard

W3C is the abbreviate of English World Wide Web Consortium, board of Chinese means W3C or World-Wide-Web are allied. W3C held water in lab of science of computer of Masschusetts Institute of Technology in October 1994. The person that found is the contriver Tim Berners-Lee of World-Wide-Web.

W3C organization is the organization of gain of a blame that makes to network standard, the standard that resembles HTML, XHTML, CSS, XML comes by W3C namely custom-built. W3C member (about 500 members) lab of user of supplier of the firm that includes to produce technical product and service, content, organization, research, metric system decides orgnaization and government sector, work in coordination together, be dedicated to developing to just reach consensus up in World-Wide-Web. Main to the understanding of W3c to Seo worker body is right now Html, xhtml, the semantics of code of Xml language label changes understanding to go up.

2, website map

1.html website map

Html website map is a kind of compose builds train of thought. Also be to link compose to build a technology. Html map is communication website the bridge between main column. Rising to guide the person that browse and stimulative reptile program to all over the action that all previous stands completely. If, map link entrance is overmuch, should maintain every pieces of map not to exceed 100 links to enter the mouth.

To celestial pole, website map has the text version that points to channel page and inferior classification page the link is enough, more important is the distinguishing feature that the understanding that makes an user very clear can offer what and celestial pole to celestial pole in where; to medium-sized site, still suggest the written language that adds content page in website map is linked, can generate the character that with article keyword tie-in article caption is anchorage text to link website map through technical measure, and use distribute a page automatically.

Map of website of 2. cereal song

Google map differs with Html map action, he is the website map that face search engine completely and makes. Google map is based on Xml technology. Google map full name is a Google Sitemaps is Google tool related to website administrator, build the stimulative Google with OK and effective Google Sitemaps to be opposite of website page collect. Google map has been used extensively by each websites now.

3, Robots.txt

The reptile program announce that Robots is search engine (creeping implement announce) . To the person that website controller and content are offerred, can have content of a few sites occasionally, do not hope to be made public by ROBOTS capture. To solve this problem, ROBOTS developed a bound to offer two way: One is Robots.txt, another is label of The Robots META.

1. what is Robots.txt?

Robots.txt is file of a simple text, do not consider the share that is visited by Robots through stating in this file this website is medium, such, the part of this website or entire content need not be collected by search engine, perhaps appoint search engine to collect appointed content only.

When robot of a search visits a site, it can check catalog of this site root to fall to whether be put in Robots.txt above all, if find, search robot is met the limits that according to this file medium content defines a visit, if this file is nonexistent, so search a robot along link capture.

Robots.txt must place the root catalog in a site to fall, and file name must whole small letter.

Website URL

The URL of corresponding Robots.txt


/ Robots.txt

2.The grammar of Robots.txt

Robots.txt file includes or more records, these records part through empty travel (with CR, CR/NL, or NL regards an end as accord with) , the format of each record is shown as follows:


# can be used to have comment in this file, the rule in specific use means and UNIX is same. The record in this file begins with group or much travel User-agent normally, a certain number of Disallow travel are added from the back, detailed situation is as follows:


This value is used at the description to search the name of engine Robot, in Robots.txt file, if many User-agent records a specification to have many Robot,can get the limitation of this agreement, to this file, want to have an User-agent record at least. If this value is set,be * , criterion this agreement is significant to any machine average per capita, in Robots.txt file, user-agent: Such * record can have only.


An URL that this value uses at the description not to hope to be visited, this URL can be a whole way, also can be a part, any URL with Disallow begin all won’t be visited by Robot. For example Disallow: / Help is right / Help.html and / Help/index.html does not allow to search engine to visit, and Disallow: / Help/ allows Robot to visit / Help.html, and cannot visit / Help/index.html.

Records of any a Disallow are empty, all shares that show this website allow to be visited, be in / in Robots.txt file, want to have record of a Disallow at least. If / Robots.txt is an empty file, prop up Robot to all search index, this website is open.

The use with a few main Robots.txt is below:

Prohibit any parts of website of visit of all search engine:

User-agent: *

Disallow: /

Allow all Robot visits

User-agent: *


Also perhaps can build an empty file / Robots.txt File

Prohibit a few all parts that search engine to visit a website (catalog of the Cgi-bin in next exemples, Tmp, Private)

User-agent: *

Disallow: / Cgi-bin/

Disallow: / Tmp/

Disallow: / Private/

Prohibit the visit of engine of a certain search (the BadBot in next exemples)

User-agent: BadBot

Disallow: /

The visit that allows engine of a certain search only (the WebCrawler in next exemples)

User-agent: WebCrawler


User-agent: *

Disallow: /