Webmaster Forums - Webmaster forum for HTML, PHP, ASP, CSS and more

Go Back   Webmaster Forums - Webmaster forum for HTML, PHP, ASP, CSS and more > Web Design Forum > HTML / CSS
User Name
Password

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 07-27-2005, 07:30 PM   #1 (permalink)
angie
Member
 
Join Date: Jul 2005
Location: Nebraska
Posts: 33
Default Robots.txt expert needed

Hello,

MSN bot is going mad on my site. I have the ip xxx.xx.xx.xxx of the most offending ip address. How do I disallow this bot by ip in a robots.txt file. I want to be sure I am only disallowing this ip only and not other bots. Any help on this is appreciated.

cheers
angie is offline   Reply With Quote
Sponsored Links
Old 07-27-2005, 08:25 PM   #2 (permalink)
sandman
Junior Member
 
Join Date: Jul 2005
Posts: 34
Default

Hello - This may help

The "/robots.txt" file usually contains a record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/". Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

To exclude all robots from the entire server
User-agent: *
Disallow: /

To allow all robots complete access
User-agent: *
Disallow:

Or create an empty "/robots.txt" file.

To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

To exclude a single robot
User-agent: BadBot
Disallow: /

To allow a single robot
User-agent: WebCrawler
Disallow:

User-agent: *
Disallow: /

To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/docs/

Alternatively you can explicitly disallow all disallowed pages:
User-agent: *
Disallow: /~joe/private.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

Remember: This file must reside in your websites root directory or where your index.html file lives for example.

regards
sandman is offline   Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Points Per Thread View: 1.00
Points Per Thread: 11.00
Points Per Reply: 5.00



» Sponsors

» Links

» Affiliates
Web Hosting
Online Backup Reviews
Marketing Find
Merchant Select
SiteMap Builder
Host Compare
Dedicated Servers

» Links

» Sports Network
Paintball Forum
Football Forum
Hockey Forum
Golf Forum
Boxing Forum
Lacrosse Forum
Baseball Forum
SnowBoarding Forum
Soccer Forum
MMA Forum


All times are GMT -4. The time now is 09:26 PM.



LinkBacks Enabled by vBSEO 3.0.0 RC8
Webmaster Forums