
And so, here we go...
Appointment of a robots.txt file
The robots.txt file exists for a long time. Back in 1994, there is an agreement about its use. This is a plain text file containing instructions clear search engines. For those who do not know:
- Search robot is a software search engine to index the documents published on the Internet.
- Indexing is the process of adding information about the site in the search engine database.
- Indexing is necessary to quickly find the information you need on the Web search engine users: Google, Rambler, Yahoo, MSN, etc.
The robots.txt file blocks access to such robots directories and files than providing an invaluable service to all, usually do not index directories with scripts, such as «cgi-bin» and other software directories. Other directories and files that contain proprietary and other information are not intended to be indexed.
The format of the robots.txt file
To start the search engine to index your site enough to create an empty robots.txt file and place it in the root folder of your website. It was there that he will seek a search engine robot. The path to the file should be:

The robots.txt file must be named only and not otherwise, the name in lower case. File located in the root of your site. Empty file allows to indexing all the content of your website to all search engines, Just worth to mention that the robots.txt file in any case does not prevent access to content and has only advisory functions, If the robot instructed to inspect all of the directories - it will ignore all the taboos recommendations and knows how to go.
Syntax
To recommend any robot not to index a particular directory, one or more records of office ending in a newline (CR, CR / NL or NL). If multiple rows are separated by one or more blank lines. Each entry must contain the lines (lines) of the following form:

Where the field is for <field> directives are not case sensitive input characters, and <value> - value taken to the execution of the directive. Directives are not many: Use-agent, Disallow, Host, and Sitemap.
The robots.txt file can include comments starting with "#" and ending with the end of the line.
User-Agent
The entry should begin with one or more rows with the value «User-agent».
- The value of this field is the name of the robot, which the access rights.
- If multiple entries of robots, the rights will be the same for everyone.
- If the value of this field to specify the symbol "*", the rules will apply absolutely to all search engines.
Disallow
The following is one or more lines with the directive «Disallow».

Record (record) must contain at least one row (line) «User-agent» and one line «Disallow».
Examples of the robots.txt file
Example 1:

In Example 1 is closed from indexing the contents of the directory / cgi-bin/script / and / tmp /.
Example 2:

Example 2 is closed from indexing the contents of the directory / tmp /, but a spider powersearch everything is permitted.
Example 3:

In Example 3, prohibits any search engine to index your entire site.
Host
Directive «Host» is used only in the case of the robot Google The rest of the robots it is "in the drum."Enter your robots.txt file that line where you have to specify the name of your website, which will point to its main mirror. A good thing will help to avoid problems with bonding, putting up mirrors. In spite of the fact that if you want to allow Google index a site completely - MUST BE RECORDED AT LEAST ONE LINE WITH DIRECTIVE «DISALLOW»:
An example of a robot Google:

Sitemap
This directive tells search engines to locate a clear site map, Site Map useful when your website contains thousands of pages. This helps the search engine to index it more quickly. If necessary, add the following lines to your robots.txt:

Examples of the use of the robots.txt file
File Location

Example: Disable the entire site to be indexed by all the robots

Example: Allow all robots to index the entire site

Or you can just create an empty file robots.txt.
Example: Close by indexing only a few directories

Example: Prevent indexing of the site for only one robot

Example: Allow indexing of the site and one robot to disable all other

Example: Disable for indexing all files except one
Not an easy task, instructions «Allow» does not exist. Move all files except the one you want to allow for indexing in the directory and disable it from being indexed:

The second option - disable each file individually:

An example of a robots.txt file for Wordpress blog

Meta tag ROBOTS
There are times when it is necessary to disallow any page. This is done using a meta-tag «ROBOTS».
In this simple example:
META NAME = "ROBOTS" CONTENT = "NOINDEX, NOFOLLOW"
Robot should neither index the document, nor analyze facing the shortcuts.
Unlike the Robot Exclusion Standard that the restriction of access rights to the site from its administrator, you can do it yourself.
Where to place the meta tag ROBOTS:

Just a few examples of the use:

Non-standard methods of management by search engines
The robots.txt file can limit access to search engines and directories at the site files. ROBOTS meta tag on the page level. What if the task is to prevent indexing only the text or links on the page, For this, there are tag noindex and attribute rel = "nofollow" tag A.
Example:

In the example, the tag noindex we offer search engines Google and Rambler second sentence is not indexed, and attribute nofollow talking robot Google does not follow this link. The attribute rel = "nofollow" can be used both before and after the URL and share with other attributes «rel» written contract with a space. Google's robots do not understand the tag noindex and also such use violates the validity of html-code of the page. If there is a need for it not to break, it is recommended to use the following syntax of their writing:

Other methods of control indexing
There are other methods of search engines to block access to content sites, For example using the module Web Server «mod_rewrite», programmatically using Javascript or file, Htaccess. In the future we will address these issues.
Accepted additions, comments. Comments are welcome. See you soon!
Actually that is superb lesson about robots.txt. There has clear description with some sources.
ReplyDeleteNice post!The information you have provided here is just useful for me.Thanks for sharing this useful material here with us...
ReplyDeleteThis article is successful article. Their include A to Z everything about robot.txt file. Actually any person can learn about with this article’s help.
ReplyDeleteActually this is helpful article for web field students and employers. I also am studying about SEO in these days. This article is teaching me about robot.txt file as well. Actually this article describe about primary level to advanced step.
ReplyDelete