Post #350,098
11/3/11 4:22:10 PM
|
Is this htaccess tip wrong?
http://digwp.com/201...anonical-sitemaps
I get an error: "Firefox has detected that the server is redirecting the request for this address in a way that will never complete."
I think I understand what it's doing wrong, but don't know how (or if it's possible) to do it right.
--
Drew
|
Post #350,104
11/3/11 6:50:27 PM
|
working as designed
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free American and do not reflect the opinions of any person or company that I have had professional relations with in the past 55 years. meep
|
Post #350,110
11/3/11 9:01:16 PM
|
But I *want* people to see the sitemap
A redirect that goes in a circle isn't helping. I know it's trying to redirect any sitemap requests anywhere other than the root down to the correct location in root, but this isn't doing it right.
--
Drew
|
Post #350,118
11/3/11 10:07:59 PM
|
no, they are deliberately doing an open loop
after 20 times around http browsers, wget et al bail out
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free American and do not reflect the opinions of any person or company that I have had professional relations with in the past 55 years. meep
|
Post #350,125
11/3/11 11:09:21 PM
|
Why would you do that?
--
Drew
|
Post #350,127
11/3/11 11:12:51 PM
|
Looks like a honeypot for bad bots.
|
Post #350,129
11/3/11 11:31:51 PM
|
But how do the *good* bots get it?
--
Drew
|
Post #350,111
11/3/11 9:07:12 PM
11/3/11 9:10:31 PM
|
Re: Is this htaccess tip wrong?
This is for sitemaps on certain vhost/machines with the sitemap actually on a different machine/vhost
Lets say you are www.example.com and all your high bandwidth stuff is configured for wosh.example.com you'd have to use something like this: RedirectMatch 301 /sitemap.xml$ http://wosh.example.com/sitemap.xml
RedirectMatch 301 /sitemap.xml.gz$ http://wosh.example.com/sitemap.xml.gz
You wouldn't put this on wosh.example.com when that would end up being loopy!
you could also do this on the same host (ie: www,example.com): RedirectMatch 301 /sitemap.xml$ http://www.example.c...ation/sitemap.xml
RedirectMatch 301 /sitemap.xml.gz$ http://www.example.c...on/sitemap.xml.gz
Edited by folkert
Nov. 3, 2011, 09:10:31 PM EDT
|
Post #350,126
11/3/11 11:09:56 PM
|
So if this *is* on the machine the sitemap is on ...
... then I shouldn't do it?
--
Drew
|
Post #350,140
11/4/11 8:06:54 PM
11/4/11 8:08:07 PM
|
Correct! Unless...
You have the sitemap somewhere else...
Then you'd use the last example.
Edited by folkert
Nov. 4, 2011, 08:08:07 PM EDT
|
Post #350,146
11/4/11 10:25:55 PM
|
How 'bout this?
For canonical favicons I've got this:
RewriteBase /
RewriteCond %{REQUEST_URI} !^/favicon.ico$ [NC]
RewriteCond %{REQUEST_URI} /favicon(s)?.?(gif|ico|jpe?g?|png)?$ [NC]
RewriteRule (.*) http://cooklikeyourg...r.com/favicon.ico [R=301,L]
This seems to be saying that any request that contains the string "favicon" but is not already "favicon.ico" in the root should get redirected to that one in the root. Seems like the same thing should work for the sitemap, yes?
--
Drew
|
Post #350,148
11/4/11 10:59:47 PM
|
??
how many favicons do you got?
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free American and do not reflect the opinions of any person or company that I have had professional relations with in the past 55 years. meep
|
Post #350,149
11/4/11 11:12:47 PM
|
One, that's the point
Different browsers or robots might look for favicons with any of the standard image extensions, and may look for it in non-standard locations. What those rules do -- if I'm reading them right -- is take any request for an image file named "favicon.ext" except for "/favicon.ico" and return the image at "/favicon.ico".
--
Drew
|
Post #350,150
11/5/11 12:19:25 AM
|
Ummm...
In all the logs I've ever looked at... and trust me I have sometimes a many 10s of GB per month.
Not once have I seen a single request for anything except "favicon.ico"
Back in the early days, when favicon wasn't standardized... sure.
Its not something you need to worry about.
I could go all the way back to 2000 for the place I work and grab all the logs and grep through them for something like "favicon.*" and exclude all "favicon.ico" (case insensitive)... bet you I'd find the user agent to be scrubbers or not friendly robots.
You see, if there is support for these OLD formats, that makes the VHOST/machine more intriguing and therefore will be hammered on for OLD OLD OLD exploits (to perchance come across an old PHP app or old CGI script) and just drive up your bandwidth.
Its just NOT something I'd even support and have argued with SEO twinks about it for days on end. Google doesn;t require anything except a valid favicon.ico... and does *NOT* mark you off for it.
Its just like some SEOs thinking "#" at the end of the URL is a bad thing. No... its ignored anyway, its just a frickin' place holder.
Just because someone says you *HAVE* to support it, doesn't mean you should.
Its truly not something you want to even begin to support.
I've checked my Web Application Firewalls... and wouldn't you know it...
Surprisingly, there are optional rules to tell it to ignore requests for favicon.(NON-ico) and to not even reply to the request.
We don't have the optional rules enabled since they also block using "+" as a separator for certain things in our CMS.
Its really not worth it.
|
Post #350,152
11/5/11 12:46:06 AM
|
And just so we understand the numbers involved here...
I just went through the last year's worth of logs for all the domains we support and my personal ones.
nearly 68 million hits for case insensitive "favicon"
Of those non-ico hits are about 95,000. And about 99% were for "favicon.gif"
They all have come from 13 networks with "Filtered RIPE" which means they are privately operated and typically are "research" related. Additionally, most all of the requests *DO NOT* include a User Agent that is considered known. Which means its a machine scrubbing through.
Many of the "ico" ones have UAs of:
Mozilla/5.0 (compatible; YandexFavicons/1.0; +http://yandex.com/bots)
Mozilla/5.0 (Really Gmane.org's favicon grabber)
Or are similar scrapers.
|
Post #350,154
11/5/11 3:31:33 AM
|
Okay
So for a favicon request that's not .ico and in root, I shouldn't return anything. Is that also true of requests for sitemap.xml in the wrong place?
BTW these tips weren't for SEO benefit. These were presented as tuning your webserver for better performance ... although they might have been offered by a guy more into SEO than into sysadmining.
--
Drew
|
Post #350,158
11/5/11 1:02:48 PM
|
Sitemaps are known to be in the proper place...
Except for some CMS or Blogging machines that are "shared"...
Only if you are on something like "www.blogthis.info/yourblog" the sitemap for your "sub site" would be "www.blogthis.info/yourblog/sitemap.xml"
If they request a sitemap somewhere other than proper placement, they are either fishing for badly configured sites, or deliberately trying to probe you for what exactly you have setup and where things are...
Typically, this works on Window IIS server because of all the automagic stuff they do, but in all honesty, do you really want to support "Windows" nomenclature and functions on non-IIS servers?
In all honesty, you can add in all the "corrections for errors" you want, you are just feeding evidence for this stuff by rote and not with good reasoning.
If you really want to support this kind of stuff, you should look into the "mod spelling" module for Apache and get into using that properly. It'll help people much more than these 5000 rules from an SEO will. But mod_speling will definitely have an impact on performance if you get scanned/scraped with bad stuff trying to find your "myphpadmin" (just an example) pages.
Its a juggle, all of these "protections" from the user getting lost sometimes make managing things just not worth the time and effort. You'd think with all these really new and fully featured Web Browsers existing now a days, this stuff would just go away. But again, advice from 12 years ago on how to push all the wonky stuff that existed back then (including the drive away from ".gif") to a standard today...
Think about how many people still use the browsers from 1999... not many, including people still on Windows 95/98/ME and WindowsNT4 and Windows2000. As is the case, in order to even get around the internet now... you have to have a browser that support many newer standards that weren't even thought of yet in 1999.
Take my opinion with a Grain of salt and let it simmer a bit. Then do what you want.
|