Wordpress blog updates - Seo plugin, template tutorial, logo design: The robots.txt file

The robots.txt file is a rule according to which the search engine will index your site. Where you want the file robots.txt? The record format and syntax, which supports the file robots.txt. How to use meta-tag «ROBOTS»? What non-standard methods of management by search engines exist? How to avoid gross errors in compiling the file robots.txt? Here is the list of issues raised by this article.

And so, here we go...

Appointment of a robots.txt file

The robots.txt file exists for a long time. Back in 1994, there is an agreement about its use. This is a plain text file containing instructions clear search engines. For those who do not know:

Search robot is a software search engine to index the documents published on the Internet.
Indexing is the process of adding information about the site in the search engine database.
Indexing is necessary to quickly find the information you need on the Web search engine users: Google, Rambler, Yahoo, MSN, etc.

In fact the instructions given in the robots.txt file are generally reduced only to what to tell the search engine what files and directories site is not indexed, I.e. not to introduce into their database, Any site contains directories and files that do not contain useful information for network users. Their indexing can cause additional load on the server and even harm a site with rankings in the search results.

The robots.txt file blocks access to such robots directories and files than providing an invaluable service to all, usually do not index directories with scripts, such as «cgi-bin» and other software directories. Other directories and files that contain proprietary and other information are not intended to be indexed.

The format of the robots.txt file

To start the search engine to index your site enough to create an empty robots.txt file and place it in the root folder of your website. It was there that he will seek a search engine robot. The path to the file should be:

The robots.txt file must be named only and not otherwise, the name in lower case. File located in the root of your site. Empty file allows to indexing all the content of your website to all search engines, Just worth to mention that the robots.txt file in any case does not prevent access to content and has only advisory functions, If the robot instructed to inspect all of the directories - it will ignore all the taboos recommendations and knows how to go.

Syntax

To recommend any robot not to index a particular directory, one or more records of office ending in a newline (CR, CR / NL or NL). If multiple rows are separated by one or more blank lines. Each entry must contain the lines (lines) of the following form:

Where the field is for <field> directives are not case sensitive input characters, and <value> - value taken to the execution of the directive. Directives are not many: Use-agent, Disallow, Host, and Sitemap.

The robots.txt file can include comments starting with "#" and ending with the end of the line.

User-Agent

The entry should begin with one or more rows with the value «User-agent».

The value of this field is the name of the robot, which the access rights.
If multiple entries of robots, the rights will be the same for everyone.
If the value of this field to specify the symbol "*", the rules will apply absolutely to all search engines.

Disallow

The following is one or more lines with the directive «Disallow».

Record (record) must contain at least one row (line) «User-agent» and one line «Disallow».

Examples of the robots.txt file

Example 1:

In Example 1 is closed from indexing the contents of the directory / cgi-bin/script / and / tmp /.

Example 2:

Example 2 is closed from indexing the contents of the directory / tmp /, but a spider powersearch everything is permitted.

Example 3:

In Example 3, prohibits any search engine to index your entire site.

Host

Directive «Host» is used only in the case of the robot Google The rest of the robots it is "in the drum."Enter your robots.txt file that line where you have to specify the name of your website, which will point to its main mirror. A good thing will help to avoid problems with bonding, putting up mirrors. In spite of the fact that if you want to allow Google index a site completely - MUST BE RECORDED AT LEAST ONE LINE WITH DIRECTIVE «DISALLOW»:

An example of a robot Google:

Sitemap

This directive tells search engines to locate a clear site map, Site Map useful when your website contains thousands of pages. This helps the search engine to index it more quickly. If necessary, add the following lines to your robots.txt:

Examples of the use of the robots.txt file

File Location

Example: Disable the entire site to be indexed by all the robots

Example: Allow all robots to index the entire site

Or you can just create an empty file robots.txt.

Example: Close by indexing only a few directories

Example: Prevent indexing of the site for only one robot

Example: Allow indexing of the site and one robot to disable all other

Example: Disable for indexing all files except one

Not an easy task, instructions «Allow» does not exist. Move all files except the one you want to allow for indexing in the directory and disable it from being indexed:

The second option - disable each file individually:

An example of a robots.txt file for Wordpress blog

Meta tag ROBOTS

There are times when it is necessary to disallow any page. This is done using a meta-tag «ROBOTS».
In this simple example:

META NAME = "ROBOTS" CONTENT = "NOINDEX, NOFOLLOW"

Robot should neither index the document, nor analyze facing the shortcuts.
Unlike the Robot Exclusion Standard that the restriction of access rights to the site from its administrator, you can do it yourself.

Where to place the meta tag ROBOTS:

Just a few examples of the use:

Non-standard methods of management by search engines

The robots.txt file can limit access to search engines and directories at the site files. ROBOTS meta tag on the page level. What if the task is to prevent indexing only the text or links on the page, For this, there are tag noindex and attribute rel = "nofollow" tag A.

Example:

In the example, the tag noindex we offer search engines Google and Rambler second sentence is not indexed, and attribute nofollow talking robot Google does not follow this link. The attribute rel = "nofollow" can be used both before and after the URL and share with other attributes «rel» written contract with a space. Google's robots do not understand the tag noindex and also such use violates the validity of html-code of the page. If there is a need for it not to break, it is recommended to use the following syntax of their writing:

Other methods of control indexing

There are other methods of search engines to block access to content sites, For example using the module Web Server «mod_rewrite», programmatically using Javascript or file, Htaccess. In the future we will address these issues.

Accepted additions, comments. Comments are welcome. See you soon!

4 comments:

Men Formal ShirtsApril 16, 2013 at 12:57 AM
Actually that is superb lesson about robots.txt. There has clear description with some sources.
Affordable Logo DesignApril 17, 2013 at 5:19 AM
Nice post!The information you have provided here is just useful for me.Thanks for sharing this useful material here with us...
Find hand tufted rugsApril 21, 2013 at 11:43 PM
This article is successful article. Their include A to Z everything about robot.txt file. Actually any person can learn about with this article’s help.
Hire a limo for normalApril 22, 2013 at 12:37 AM
Actually this is helpful article for web field students and employers. I also am studying about SEO in these days. This article is teaching me about robot.txt file as well. Actually this article describe about primary level to advanced step.

Apr 5, 2013

The robots.txt file

4 comments:

Followers

Blog Archive

Subscribe Via Email

Logo Tips

Brand Logos?

Company Logo

Logo Ideas

World Logos

Seo Tips

Seo Tips

White Hat Seo

Black Hat Seo

Seo Promotion

About Me

About Wordpress blog updates

Blogger news

About

Apr 5, 2013

4 comments:

Followers

Blog Archive

Subscribe Via Email

Logo Tips

Seo Tips

About Me

About Wordpress blog updates

Contact us

Sign up for our newsletter

Blogger news

About