IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Not sure if this belongs here or in "Networking"...
first of all... hi all! Glad the site is back up. For a while it looked like the place had completely disappeared...

Second... I wonder if any of you are familiar with this phenomenon.

I recently converted Ubersoft.net completely from its former static-driven existence to a completely database-driven setup using Drupal. Almost immediately after doing that my site was hammered and had to be taken offline in order to allow the rest of the sites on the server to continue operating. Right now I'm on a test server to see if I can get the site traffic under control...

One of the tools Drupal has is called "watchdog" -- essentially it allows you to look at your log files to see what's going on, what files are being accessed, what site errors are being generated. By far most of the reports I get on watchdog are for 404 errors -- basically I'm being inundated with someone/something trying to access files in their old, static-site locations. Here's a small example:


warning\tpage not found\t25 Mar 2007 - 11:01am\td/20070325.html\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 11:00am\tcomic.rss\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:59am\td/20070325.html\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:55am\tkpanic/index.html\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:54am\tcomics/hd20060720.png\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:52am\tfiles/comics/hd/hd20070325.png\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:50am\tkpanic\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:50am\tcomics/hd20070325g.png\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:50am\tcomics/hd20070325e.png\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:50am\tcomics/hd20070325c.png\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:50am\tcomics/hd20070325a.png\tAnonymous\t
warning\tpage not found\t25 Mar 2007 - 10:48am\tcomics/hd20070325.png\tAnonymous


This goes on for pages and pages, and you can see by the time interval that it just doesn't let up.

One suggestion someone made is that google and other search sites are trying to re-index my site based on last known file locations, but yesterday I modified my robots.txt file to exclude all the old non-existent directories and that hasn't helped any.

Getting this to stop would certainly help reduce some of the resources my site is consuming... each time one of these hits my database tries to find a corresponding match before moving it on to the 404 message.

Anyone have experience dealing with this?
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New might have to do with the fact that most engines
ignore the robots.txt file. Eaasiest way might be to present the old static pages until they quit.
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep

reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
New But then they'd never quit.
Hmmm.

You could go the pain in the ass track down and hassle people mode, which may or may not work, or you could setup some type of mechanism to black-hole their IP address after 'X' number of failures. Preferably on a device BEFORE your web server gets hit.

New Welcome back.
Can you determine the IPs that are hammering the site? I would think that you could temporarily block them until you get things back under control. Somehow...

Google supposedly [link|http://www.google.com/support/webmasters/bin/answer.py?answer=33570&topic=8846|respects robots.txt files]. I'm surprised that legitimate sites would not.

There may be zombies out there that are hammering your site based on some perceived or real [link|http://www.secuobs.com/secumail/snsecumail/msg04920.shtml|Drupal vulnerability] as well.

My home connection is constantly being hammered by TCP Flood attacks and Port Scans, and I don't even have a public site up. It seems to be something we just have to live with.

Best of luck.

Cheers,
Scott.
New Probably not bots - more likely fans
Bots would have a much higher hit rate. Trust me, I know.

If they are bots, filtering on ip address isn't going to help you - they switch addresses fast as soon as you lock them out.

I'd try to start serving 301's "Resource Moved Permanently". Hopefully they'll get the hint.




I4 NOW!


Impeach, Indict, Incarcerate, Inject
Bush, Cheney, Gonzalez, Rumsfeld, Rove, Rice
New Re: Probably not bots - more likely fans
How do I serve those? I'm not familiar with how that works...
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New This looks like a good article on it.
But I haven't done it, so hopefully someone else will review and pass comment.

[link|http://www.mcanerin.com/articles/301-redirect-apache.htm|http://www.mcanerin....direct-apache.htm]
New Here's the short version
I haven't read the article crazy linked on setting it up, and it depends on the webserver anyway, but I'll assume Apache. The basic idea is that you have a file with a set of regular expressions that identify the old URLs and munge them into the new URLs. The new URLs are returned to the browser with a code that says, "Don't use that URL, use this one instead."

Anything remotely automated will pick it up and stop hitting the old ones. If you think it's real people, then you'll want to redirect to an interstitial that says, "That page has moved, here's the new URL, you'll be automatically redirected in 10 seconds."

The important part is that this all happens in the webserver, before it hits your DB. And it's just doing a regex match/replace, not a complex lookup. (The file can have multiple regexes.)
===

Kip Hawley is still an idiot.

===

Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
[link|http://DocHope.com|http://DocHope.com]
New Oh, OK...
That sounds like what I want, then. So essentially it's just one file with a series of links pointing to other links? That's easier to do than I originally feared... as soon as I read up on the proper format.
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New 124K - eek!
I created a 301 redirect list of all my help desk images and placed that in my .htaccess file...

Redirect 301 /comics/hd20010306.png [link|http://files/comics/hd/hd20010306.png|http://files/comics/hd/hd20010306.png] etc etc

1400 lines later, the htaccess file is now 124.7 k! From an original 5.7k.

I'm having trouble rationalizing loading that up into my live site. That's an awfully big file.
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New I wouldn't sweat it.
We have a 62Kb httpd.conf file on our main website. And it's big enough to need it's own load-balanced configuration with 10 web servers.

Wade.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New oh, that's right...
.htaccess is read by the SERVER, not by browsers.

Heh.

I knew that. :)
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New RedirectMatch?
I think the way to cut down on the size of this file is to use RedirectMatch instead of Redirect... RedirectMatch lets you use wildcards.

Alas, I'm not quite familiar with how it's supposed to work, so this is a guess:

move all png files starting with "hd" from /comics to /files/comics/hd:

RedirectMatch 301 /comics/hd(.*)\\.png$ [link|http://ubersoft.net/files/comics/hd$1|http://ubersoft.net/files/comics/hd$1]

... does that look right?
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New you might not even need the...
http://ubersoft.net part.

Though I haven't done any mass relocations like that ever.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New But are the wildcards used correctly?
That's the part I'm nervous about.

Also, if I wanted to redirect a number of files to a specific page:

"all html files in the 'd' directory starting with 'hd1996' to '[link|http://ubersoft.net/comic/hd/archives/1996'|http://ubersoft.net/...hd/archives/1996'] "

would I just drop the "$1", i.e.

RedirectMatch 301 /d/hd1996(.*)\\.html$ [link|http://ubersoft.net/comic/hd/archives/1996|http://ubersoft.net/.../hd/archives/1996]

... and if I did that would it be appropriate to use the 301, or would it be more accurate to use one of the other redirect #'s?

I'm pretty new at this... the stuff I've googled doesn't quite cover it, or the conversations and explanations assume a much higher familiarity with the subject than I have... and the apache documentation, unfortunately, is rather like reading sanskrit.
"We are all born originals -- why is it so many of us die copies?"
- Edward Young
New Looks like it. According to current Apache docs:
[link|http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch|apache 1.3 mod_alias docs]

RedirectMatch (.*)\\.gif$ [link|http://www.anotherserver.com$1.jpg|http://www.anotherserver.com$1.jpg]


Just add the parts you need.

But it appears you are good. Nothing a test wouldn't fix.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
     Not sure if this belongs here or in "Networking"... - (cwbrenn) - (15)
         might have to do with the fact that most engines - (boxley) - (1)
             But then they'd never quit. - (crazy)
         Welcome back. - (Another Scott)
         Probably not bots - more likely fans - (tuberculosis) - (11)
             Re: Probably not bots - more likely fans - (cwbrenn) - (10)
                 This looks like a good article on it. - (crazy)
                 Here's the short version - (drewk) - (8)
                     Oh, OK... - (cwbrenn) - (7)
                         124K - eek! - (cwbrenn) - (6)
                             I wouldn't sweat it. - (static) - (1)
                                 oh, that's right... - (cwbrenn)
                             RedirectMatch? - (cwbrenn) - (3)
                                 you might not even need the... - (folkert) - (2)
                                     But are the wildcards used correctly? - (cwbrenn) - (1)
                                         Looks like it. According to current Apache docs: - (folkert)

With one hand tied behind my back.
113 ms