|
robot help: SpiderMonkey robots.txt Fetcher It is important that every web site have a robots.txt file in the root directory to avoid the numerous 404 errors and to make the site more "robot-friendly". To manage a robot's crawl of your site, you can use this simple file
(robots.txt) in the top-level domain (i.e.: www.mobrien.com/robots.txt) as an adjunct to properly written META tags.
# EXAMPLE robots.txt
User-agent: *
# Enter specific user-agent but "*" is best.
Disallow: /cgi-bin/
Disallow: /cgi-win/
Disallow: /tmp/
Disallow: /images/
Disallow: /includes/
Disallow: /public/~specific-user/
- - - - -
# EXAMPLE robots.txt to exclude a single robot
User-agent: Bad-Bot-From-Hades
Disallow: /
- - - - -
# EXAMPLE robots.txt to allow single robot anything but forbid the rest some specifics.
User-agent: Googlebot
Disallow:
User-agent: *
Disallow: /mybirthdaysuitpictures/
Disallow: /reallyspecialimagesIdontwantpilfered/
- - - - -
Try SpiderMonkey's robots.txt Generator. |
So what does yours look like?
- This SpiderMonkey resource will fetch and examine your robots.txt file from your web site, if it has one.
- If not, it will let you know.
Please enter your domain name:
|
|