WXforum.net
May 18, 2013, 09:44:54 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
Members: 6609  •  Posts: 178200  •  Topics: 18099
Please welcome TargY, our newest member.
Welcome to the the new hosting for WXforum.net.
 
   Home   Help Search Login Register  
Pages: [1]   Go Down
  Print  
Author Topic: New web crawler  (Read 334 times)
0 Members and 1 Guest are viewing this topic.
linuxfreak
Hamilton Central Mountain Weather
Contributor
***
Offline Offline

Posts: 104


Wink, wink!


WWW
« on: January 21, 2012, 11:31:15 AM »

Had a new one to me hit my site, and hard! The company running the robot has a FAQ that seems fishy as to how their robot scans your site.

Quote
Why is your web crawler trying to access pages that don't exist on my website?

    Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages. The crawler may misinterpret the information in these scripts and request a page that does not actually exist. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.

In my books, what I saw in the server logs didn't look like a robot scan from Google or Bing but more like a site scraper hammering my site, and going into the javascript, well that is a bit sneaky.  Confused

Yes I could have been nice an added their bot to the robots.txt on the site, but when I saw the scraping action of the bot, it gained a place in my user agent block list in the .htacess file, sorry, bad mannered robot action I don't support. Bye-bye!! Wink d'oh!

Here's their URL: http://panscient.com/index.htm

George

Added note: This bot grabbed my robots.txt a minimum of 5 times within a few minutes!! Can you say "forgetful"?!!  d'oh!

Do a Google search on "panscient com bot" and see the bad writeup this bot has got over the past 5 years, bad bot!
« Last Edit: January 21, 2012, 11:57:18 AM by linuxfreak » Logged

George
Davis VP2/FARS, VVP, WD, WL, WSWin, Cumulus, NexStorm, StrikeStar, NSLog, XPort(GPS), WASP2, DigitalAtmosphere, ScannerCast(WUradio), Intel Atom N330 dual-core, 2Gig ram, Windows XP Home SP3  d'oh!
CWOP - DW3112, PWS & WU - IONHAMIL2, AWEKAS - 5112, CWON, WML - WD01901
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.193 seconds with 18 queries.
anything