Detailed divides manager of cereal song website: How to use Robots.txt

Core clew: Robots.txt file roams to the search engine of capture network implement (call roam implement) undertake limitative. These roaming implement it is automatic, can examine before they visit a webpage whether to put in the Robots.txt file that restricts its to visit specific webpage. If you want to protect the certain content on the website to be not searched to bring,prop up the word of income, robots.txt is a simple and effective tool. Here introduces simply how to use it.

How to place Robots.txt document

Robots.txt oneself is file of a text. The root that it must be located in a domain name name Robots.txt in catalog. The Robots.txt file in be located in subdirectory is invalid, because roam implement search this file in the root catalog of the domain name only. For example, http: / / / Robots.txt is significant position, http: / / / Mysite/robots.txt is not.

Here cites the case of a Robots.txt:

User-agent: *Disallow: / Cgi-bin/Disallow: / Tmp/Disallow: / ~name/

Intercept of use Robots.txt file or delete whole website

Want to delete your website from inside searching engine, prevent all roaming implement after be in capture your website, put document of the following Robots.txt the root list of your server please:

User-agent: *Disallow: /

Want to delete your website from inside Google only, just prevent Googlebot to will come capture your website, put document of the following Robots.txt the root list of your server please:

User-agent: GooglebotDisallow: /

Every port is due oneself Robots.txt file. Especially when you pass mandatory content of Http and Https, these agreements need to have respective Robots.txt file. For example, want to let Googlebot is all Http webpages only and do not index for Https webpage work out, should use the Robots.txt document below.

To Http agreement (Http: / / Yourserver.com/robots.txt) :

User-agent: *Allow: /

To Https agreement (Https: / / Yourserver.com/robots.txt) :

User-agent: *Disallow: /

Allow all roaming implement the webpage that visits you

User-agent: *Disallow:

(another kind of method: Build a sky / Robots.txt file, perhaps do not use Robot.txt. )

Intercept of use Robots.txt file or delete a webpage

You can use Robots.txt file to prevent Googlebot capture the webpage on your website. For example, if you are being moved in the hand,found Robots.txt file with preventing Googlebot capture some specific table of contents falls (for example, private) all webpages, can use entry of the following Robots.txt:

User-agent: GooglebotDisallow: / Private

Should prevent Googlebot capture specific file type (for example, .gif) all files, can use entry of the following Robots.txt:

User-agent: GooglebotDisallow: / *.gif$

Want all including to prevent Googlebot capture? network address (specifically, this kind of network address the domain name begin with you, aleatoric string is received after, it is interrogation next, it is aleatoric string after that) , can use the following entry:

User-agent: GooglebotDisallow: / * ?

Although we not capture by the webpage of Robots.txt intercept content or for its the work out indexes, but if we are in the other webpage on the network,discover these content, we still are met capture its network address weaves index. Accordingly, the information that etc of webpage network address publishs, point to the fixed position character in the link of this website for example, possible meeting appears in Google search result. Nevertheless, the content on your webpage won’t be indexed by capture, work out and show.

The one part that regards website administrator as the tool, google offerred Robots.txt to analyse a tool. It can read the same way that takes Robots.txt file according to Googlebot take this file, and can be Google User-agents (like Googlebot) provide a result. We suggest you use it strongly. Before founding file of a Robots.txt, be necessary to consider what content to be able to be gotten by user search, and what should be gotten by search. Such word, through using Robots.txt reasonably, while search engine is leading the user to your website, can make sure privacy information is not collected again.