|
Our Crawler (SpiderMonkey)
visits and checks URLs during server off-peak load times and
feeds the result to the SQL Cluster's index. The
main database is refreshed no less than every 30 days. The temp. database is minimally crawled twice
monthly. URLs submitted remotely by authorized services
do not appear in the temp. database.
SpiderMonkey
abides by the Robot Exclusion
Standard. Specifically, Spider
Monkey adheres to the 1994 Robots
Exclusion Standard (RES). Where the 1996
proposed standard supercedes the 1994 standard, the proposed
standard is followed.
SpiderMonkey will
obey the first record in the robots.txt file with a User-Agent
containing "Spider_Monkey or SpiderMonkey". If there is no such record, It will
obey the first entry with a User-Agent of
*.It is important that every web site have a robots.txt file in the root
directory to avoid the numerous 404 errors and to make the site more
"robot-friendly".
Free use of our "Robots.tx Generator"
We offer a resource for generating your robots.txt file quickly and easily. The process will help you benefit from the accuracy and quality of this important file by allowing you to focus on content while the "robots.txt generator" takes care of syntax.
|