Search engine unites Robots file standard

Search index props up 3 tycoons to hit extremely, but now and then also cooperate. Last year Google, yahoo, microsoft cooperates, observe uniform Sitemaps standard jointly. Before two days of 3 tycoons announce at the same time again, the Robots.txt file standard that observes jointly. Google, yahoo, microsoft sent a card on the official gain guest in oneself severally, announce the standard of 3 Robots.txt files that support and Meta label, and a few respective and peculiar standards. A summary is done below.

3 Robots file records that bear include:

Disallow – tells a spider not to want capture certain file or catalog. If next face code will prevent spider capture all website files:

User-agent: *

Disallow: /

Allow – tells a spider should capture is certain file. Allow and Disallow cooperate to use, can tell a spider a certain catalog falls, major not capture, capture one part. If next face code will make spider not capture Ab catalog issues other document, a capture among them the file below Cd:

User-agent: *

Disallow: / Ab/

Allow: / Ab

$ is connected match accord with – the character that matchs URL ending. If next face code will allow spider visit to be suffixal URL with.htm:

User-agent: *

Allow: .htm$

* is connected match accord with – tell a spider to match random a paragraph of character. Following faces a paragraph of code will prohibit spider capture is all Htm file:

User-agent: *

Disallow: / *.htm

Sitemaps position – where is the website map that tells a spider you, the format is: Sitemap: <sitemap_location>

3 Meta label that bear include:

NOINDEX – tells a spider not to index a certain webpage.

NOFOLLOW – tells a spider not to dog the link on the webpage.

NOSNIPPET – tells a spider to demonstrative character shows in searching a result.

NOARCHIVE – tells a spider not to show snapshot.

NOODP – tells a spider not to use the caption in open catalog and specification.

These are recorded above or label, 3 support jointly now. Connect among them match accord with to seem Yahoo Microsoft does not support before. Baidu also supports Disallow now, allow reachs two kinds to connect match accord with. I did not find Meta label the government that whether Baidu supports explains.

The Meta label that only Google bears has:

UNAVAILABLE_AFTER – tells cobweb when the page expires. After this date, not should reappear is in search result.

NOIMAGEINDEX – tells a spider not to want the picture on index page.

NOTRANSLATE – tells a spider not to translate page content.

Yahoo still bears Meta label:

Crawl-Delay – allows the frequency of capture of spider delay time.

NOYDIR – and NOODP ticket are similar, but point to Yahoo catalog, is not open catalog.

The partial Html that Robots-nocontent – tells a spider to be tagged is not the one part of webpage content, perhaps change a point of view, the main content that what tells a spider to the part is a page (the content that wants to be retrieved) .

MSN still bears Meta label: Crawl-Delay

Remind everybody watchful is additionally, robots.txt file need not exist, return 404 errors, mean allow spider capture all content. But the mistake that overtime and so on produces however when capture Robots.txt file, the likelihood brings about search engine not to collect a website, because the spider does not know whether Robots.txt file is put,be in or inside have what content, this and affirming the file is nonexistent is different.