Teach you method of capture of search of screen of Robots file use method

Core clew: The control spider with OK and free SEO visits the content of the website, this can issue a statement through Robots file, search engine is main be agreement of comply with Robots, also SEO once used website of client of Robots file screen.

Search engine passes Robot of a kind of program (weigh Spider again) , the webpage on automatic visit Internet gets webpage information.

You can found Robots.txt of file of a simple text in your website, state to the share that is visited by Robot does not consider in this website in this file, such, the part of this website or entire content need not be collected by search engine, perhaps appoint search engine to collect appointed content only. Robots.txt file should be put below website root catalog.

Search a robot when (some calling that search a spider) when visitting a site, it can check catalog of this site root to fall to whether be put in Robots.txt above all, if exist, search robot is met the limits that according to this file medium content defines a visit; If this file is nonexistent, so search a robot along link capture.

The format of Robots.txt file:

Robots.txt file includes or more records, these records part through empty travel (with CR, CR/NL, or NL regards an end as accord with) , the format of each record is shown as follows:

<field>:&Lt;optionalspace><value><optionalspace> .

# can be used to have comment in this file, the rule in specific use means and UNIX is same. The record in this file begins with group or much travel User-agent normally, a certain number of Disallow travel are added from the back, detailed situation is as follows:


This value is used at the description to search the name of engine Robot, in Robots.txt file, if many User-agent records a specification to have many Robot,can get the limitation of this agreement, to this file, want to have record of an User- Agent at least. If this value is set,be * , criterion this agreement is significant to any machine average per capita, in Robots.txt file, user-agent: Such * record can have only.


An URL that this value uses at the description not to hope to be visited, this URL can be a whole way, also can be a part, any URL with Disallow begin all won’t be visited by Robot. For example Disallow:/help is right / Help.html and / Help/index.html does not allow to search engine to visit, and Disallow:/help/ allows Robot to visit / Help.html, and cannot visit / Help/index.html. Records of any a Disallow are empty, all shares that show this website allow to be visited, be in / in Robots.txt file, want to have record of a Disallow at least. If / Robots.txt is an empty file, prop up Robot to all search index, this website is open.

Citing of Robots.txt file usage:

Exemple 1. Prohibit any parts of website of visit of all search engine download User-agent of this Robots.txt file: * Disallow: /

Exemple 2. Allow all Robot to visit (also perhaps can build an empty file / Robots.txt File) User-agent: *Disallow:

Exemple 3. Prohibit the visit User-agent of engine of a certain search: BadBotDisallow: /

Exemple 4. Allow the visit User-agent of engine of a certain search: BaiduspiderDisallow: User-agent: *Disallow: /

Exemple 5. A simple case is in this example, this website had 3 catalog to do restriction to searching the visit of engine, search engine to won’t visit these 3 catalog namely. Those who need an attention is right each catalog must state apart, and do not write into Disallow: / Cgi-bin/ /tmp/ . User-agent: The * after has special meaning, represent Any Robot, there cannot be Disallow in this file so: / Tmp/* Or Disallow: Such *.gif record appears. User-agent: *Disallow: / Cgi-bin/Disallow: / Tmp/Disallow: / ~joe/

Robot special parameter:


Allow Googlebot:

If you want intercept to divide Googlebot beyond all roaming implement the webpage that visits you, can use following grammar:




Googlebot follows the travel that points to itself, is not to point to all roaming implement row.

Allow patulous name:

The Robots.txt level that Googlebot can know another namer to be Allow is patulous renown. Of other search engine roam implement the likelihood cannot identify this patulous name, the other search engine that because this uses you please,is interested undertakes searching. The action principle that Allow goes goes completely with Disallow same. The list that needs to list you want to allow only or page can.

You also can use Disallow and Allow at the same time. For example, want the other and all page besides a certain page in intercept subdirectory, can use following entry:




These entry will remove all pages besides Myfile.html inside intercept Folder1 catalog.

If you want intercept Googlebot to allow Google,another roams implement (like Googlebot-Mobile) , can use Allow regulation to allow to should roam implement visit. For example:





Use * date matchs character alignment:

You can use asterisk (* ) will match character alignment. For example, want intercept to be opposite the visit of all subdirectory with Private begin, can use following entry:



Want all intercept is right including interrogation (? ) the visit of network address, can use following entry:

User-agent: *

Disallow:/* ? *

Use $ matchs the end character of network address

You can use $ character to appoint the end character with network address to undertake matching. For example, want intercept the network address with.asp ending, can use following entry:



You can match this pattern to cooperate to use with Allow instruction. For example, if? Express a conversational ID, you can eliminate all network address that include this ID, ensure Googlebot won’t capture reduplicative webpage. But, with? Terminal network address may be the webpage version that you should include. Here the circumstance falls, can undertake as follows installing to Robots.txt file:

User-agent: *

Allow:/* ? $

Disallow:/* ?

Disallow:/ * ? Is group general intercept included? network address (specific, it intercept all domain name begin with you, hind replace meaning string, be interrogation next (? ) , the network address that is aleatoric string after that) .

Allow: / * ? Will $ group allow to include any with? Terminal network address (specific, it will allow to include all domain name start with you, hind replace meaning string, be interrogation next (? ) , the network address that there are any characters after interrogation) .

Sitemap website map:

To the new supportive way of website map, include the link of Sitemap file directly in Robots.txt file namely.

Resemble such:

Sitemap: / Sitemap.xml

The search engine company that represents support to this at present has Google, yahoo, ask And MSN.

Nevertheless, I suggest or undertake referring in Google Sitemap, there are a lot of functions to be able to analyse your link condition inside.

More and wonderful click below one page