Apr 5, 2013

The robots.txt file

Filled under: ,


The robots.txt fileThe robots.txt file is a rule according to which the search engine will index your site. Where you want the file robots.txt? The record format and syntax, which supports the file robots.txt. How to use meta-tag «ROBOTS»? What non-standard methods of management by search engines exist? How to avoid gross errors in compiling the file robots.txt? Here is the list of issues raised by this article.

And so, here we go...

Appointment of a robots.txt file

The robots.txt file exists for a long time. Back in 1994, there is an agreement about its use. This is a plain text file containing instructions clear search engines. For those who do not know:

  • Search robot is a software search engine to index the documents published on the Internet.
  • Indexing is the process of adding information about the site in the search engine database.
  • Indexing is necessary to quickly find the information you need on the Web search engine users:  Google, Rambler, Yahoo, MSN, etc.
In fact the instructions given in the robots.txt file are generally reduced only to what to tell the search engine what files and directories site is not indexed, I.e. not to introduce into their database, Any site contains directories and files that do not contain useful information for network users. Their indexing can cause additional load on the server and even harm a site with rankings in the search results.

The robots.txt file blocks access to such robots directories and files than providing an invaluable service to all, usually do not index directories with scripts, such as «cgi-bin» and other software directories. Other directories and files that contain proprietary and other information are not intended to be indexed.

The format of the robots.txt file

To start the search engine to index your site enough to create an empty robots.txt file and place it in the root folder of your website. It was there that he will seek a search engine robot. The path to the file should be:
The robots.txt file
The robots.txt file must be named only and not otherwise, the name in lower case. File located in the root of your site. Empty file allows to indexing all the content of your website to all search engines, Just worth to mention that the robots.txt file in any case does not prevent access to content and has only advisory functions, If the robot instructed to inspect all of the directories - it will ignore all the taboos recommendations and knows how to go.

Syntax

To recommend any robot not to index a particular directory, one or more records of office ending in a newline (CR, CR / NL or NL). If multiple rows are separated by one or more blank lines. Each entry must contain the lines (lines) of the following form:
The robots.txt file
Where the field is for <field> directives are not case sensitive input characters, and <value> - value taken to the execution of the directive. Directives are not many: Use-agent, Disallow, Host, and Sitemap.

The robots.txt file can include comments starting with "#" and ending with the end of the line.

User-Agent

The entry should begin with one or more rows with the value «User-agent».

  • The value of this field is the name of the robot, which the access rights.
  • If multiple entries of robots, the rights will be the same for everyone.
  • If the value of this field to specify the symbol "*", the rules will apply absolutely to all search engines.

Disallow

The following is one or more lines with the directive «Disallow».

The robots.txt file
Record (record) must contain at least one row (line) «User-agent» and one line «Disallow».

Examples of the robots.txt file

Example 1:

The robots.txt file


In Example 1 is closed from indexing the contents of the directory / cgi-bin/script / and / tmp /.

Example 2:
The robots.txt file
Example 2 is closed from indexing the contents of the directory / tmp /, but a spider powersearch everything is permitted.

Example 3:
The robots.txt file
In Example 3, prohibits any search engine to index your entire site.

Host

Directive «Host» is used only in the case of the robot Google The rest of the robots it is "in the drum."Enter your robots.txt file that line where you have to specify the name of your website, which will point to its main mirror. A good thing will help to avoid problems with bonding, putting up mirrors. In spite of the fact that if you want to allow Google index a site completely - MUST BE RECORDED AT LEAST ONE LINE WITH DIRECTIVE «DISALLOW»:

An example of a robot Google:
The robots.txt file

Sitemap

This directive tells search engines to locate a clear site map, Site Map useful when your website contains thousands of pages. This helps the search engine to index it more quickly. If necessary, add the following lines to your robots.txt:
The robots.txt file

Examples of the use of the robots.txt file

File Location
The robots.txt file

Example: Disable the entire site to be indexed by all the robots
The robots.txt file

Example: Allow all robots to index the entire site
The robots.txt file

Or you can just create an empty file robots.txt.

Example: Close by indexing only a few directories
The robots.txt file

Example: Prevent indexing of the site for only one robot
The robots.txt file

Example: Allow indexing of the site and one robot to disable all other
The robots.txt file

Example: Disable for indexing all files except one

Not an easy task, instructions «Allow» does not exist. Move all files except the one you want to allow for indexing in the directory and disable it from being indexed:
The robots.txt file

The second option - disable each file individually:
The robots.txt file

An example of a robots.txt file for Wordpress blog

The robots.txt file

Meta tag ROBOTS

There are times when it is necessary to disallow any page. This is done using a meta-tag «ROBOTS».
In this simple example:

META NAME = "ROBOTS" CONTENT = "NOINDEX, NOFOLLOW"

Robot should neither index the document, nor analyze facing the shortcuts.
Unlike the Robot Exclusion Standard that the restriction of access rights to the site from its administrator, you can do it yourself.

Where to place the meta tag ROBOTS:
The robots.txt file

Just a few examples of the use:
The robots.txt file

Non-standard methods of management by search engines


The robots.txt file can limit access to search engines and directories at the site files. ROBOTS meta tag on the page level. What if the task is to prevent indexing only the text or links on the page, For this, there are tag noindex and attribute rel = "nofollow" tag A.

Example:
The robots.txt file

In the example, the tag noindex we offer search engines Google and Rambler second sentence is not indexed, and attribute nofollow talking robot Google does not follow this link. The attribute rel = "nofollow" can be used both before and after the URL and share with other attributes «rel» written contract with a space. Google's robots do not understand the tag noindex and also such use violates the validity of html-code of the page. If there is a need for it not to break, it is recommended to use the following syntax of their writing:

The robots.txt file

Other methods of control indexing

There are other methods of search engines to block access to content sites, For example using the module Web Server «mod_rewrite», programmatically using Javascript or file, Htaccess. In the future we will address these issues.

Accepted additions, comments. Comments are welcome. See you soon!

4 comments:

  1. Actually that is superb lesson about robots.txt. There has clear description with some sources.

    ReplyDelete
  2. Nice post!The information you have provided here is just useful for me.Thanks for sharing this useful material here with us...

    ReplyDelete
  3. This article is successful article. Their include A to Z everything about robot.txt file. Actually any person can learn about with this article’s help.

    ReplyDelete
  4. Actually this is helpful article for web field students and employers. I also am studying about SEO in these days. This article is teaching me about robot.txt file as well. Actually this article describe about primary level to advanced step.

    ReplyDelete