IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New So if this *is* on the machine the sitemap is on ...
... then I shouldn't do it?
--

Drew
New Correct! Unless...
You have the sitemap somewhere else...

Then you'd use the last example.
Expand Edited by folkert Nov. 4, 2011, 08:08:07 PM EDT
New How 'bout this?
For canonical favicons I've got this:
  RewriteBase /

RewriteCond %{REQUEST_URI} !^/favicon.ico$ [NC]
RewriteCond %{REQUEST_URI} /favicon(s)?.?(gif|ico|jpe?g?|png)?$ [NC]
RewriteRule (.*) http://cooklikeyourg...r.com/favicon.ico [R=301,L]

This seems to be saying that any request that contains the string "favicon" but is not already "favicon.ico" in the root should get redirected to that one in the root. Seems like the same thing should work for the sitemap, yes?
--

Drew
New ??
how many favicons do you got?
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free American and do not reflect the opinions of any person or company that I have had professional relations with in the past 55 years. meep
New One, that's the point
Different browsers or robots might look for favicons with any of the standard image extensions, and may look for it in non-standard locations. What those rules do -- if I'm reading them right -- is take any request for an image file named "favicon.ext" except for "/favicon.ico" and return the image at "/favicon.ico".
--

Drew
New Ummm...
In all the logs I've ever looked at... and trust me I have sometimes a many 10s of GB per month.

Not once have I seen a single request for anything except "favicon.ico"

Back in the early days, when favicon wasn't standardized... sure.

Its not something you need to worry about.

I could go all the way back to 2000 for the place I work and grab all the logs and grep through them for something like "favicon.*" and exclude all "favicon.ico" (case insensitive)... bet you I'd find the user agent to be scrubbers or not friendly robots.

You see, if there is support for these OLD formats, that makes the VHOST/machine more intriguing and therefore will be hammered on for OLD OLD OLD exploits (to perchance come across an old PHP app or old CGI script) and just drive up your bandwidth.

Its just NOT something I'd even support and have argued with SEO twinks about it for days on end. Google doesn;t require anything except a valid favicon.ico... and does *NOT* mark you off for it.

Its just like some SEOs thinking "#" at the end of the URL is a bad thing. No... its ignored anyway, its just a frickin' place holder.

Just because someone says you *HAVE* to support it, doesn't mean you should.

Its truly not something you want to even begin to support.

I've checked my Web Application Firewalls... and wouldn't you know it...

Surprisingly, there are optional rules to tell it to ignore requests for favicon.(NON-ico) and to not even reply to the request.

We don't have the optional rules enabled since they also block using "+" as a separator for certain things in our CMS.

Its really not worth it.
New And just so we understand the numbers involved here...
I just went through the last year's worth of logs for all the domains we support and my personal ones.

nearly 68 million hits for case insensitive "favicon"

Of those non-ico hits are about 95,000. And about 99% were for "favicon.gif"

They all have come from 13 networks with "Filtered RIPE" which means they are privately operated and typically are "research" related. Additionally, most all of the requests *DO NOT* include a User Agent that is considered known. Which means its a machine scrubbing through.

Many of the "ico" ones have UAs of:

Mozilla/5.0 (compatible; YandexFavicons/1.0; +http://yandex.com/bots)

Mozilla/5.0 (Really Gmane.org's favicon grabber)


Or are similar scrapers.
New Okay
So for a favicon request that's not .ico and in root, I shouldn't return anything. Is that also true of requests for sitemap.xml in the wrong place?

BTW these tips weren't for SEO benefit. These were presented as tuning your webserver for better performance ... although they might have been offered by a guy more into SEO than into sysadmining.
--

Drew
New Sitemaps are known to be in the proper place...
Except for some CMS or Blogging machines that are "shared"...

Only if you are on something like "www.blogthis.info/yourblog" the sitemap for your "sub site" would be "www.blogthis.info/yourblog/sitemap.xml"

If they request a sitemap somewhere other than proper placement, they are either fishing for badly configured sites, or deliberately trying to probe you for what exactly you have setup and where things are...

Typically, this works on Window IIS server because of all the automagic stuff they do, but in all honesty, do you really want to support "Windows" nomenclature and functions on non-IIS servers?

In all honesty, you can add in all the "corrections for errors" you want, you are just feeding evidence for this stuff by rote and not with good reasoning.

If you really want to support this kind of stuff, you should look into the "mod spelling" module for Apache and get into using that properly. It'll help people much more than these 5000 rules from an SEO will. But mod_speling will definitely have an impact on performance if you get scanned/scraped with bad stuff trying to find your "myphpadmin" (just an example) pages.

Its a juggle, all of these "protections" from the user getting lost sometimes make managing things just not worth the time and effort. You'd think with all these really new and fully featured Web Browsers existing now a days, this stuff would just go away. But again, advice from 12 years ago on how to push all the wonky stuff that existed back then (including the drive away from ".gif") to a standard today...

Think about how many people still use the browsers from 1999... not many, including people still on Windows 95/98/ME and WindowsNT4 and Windows2000. As is the case, in order to even get around the internet now... you have to have a browser that support many newer standards that weren't even thought of yet in 1999.

Take my opinion with a Grain of salt and let it simmer a bit. Then do what you want.
     Is this htaccess tip wrong? - (drook) - (16)
         working as designed -NT - (boxley) - (5)
             But I *want* people to see the sitemap - (drook) - (4)
                 no, they are deliberately doing an open loop - (boxley) - (3)
                     Why would you do that? -NT - (drook) - (2)
                         Looks like a honeypot for bad bots. -NT - (static) - (1)
                             But how do the *good* bots get it? -NT - (drook)
         Re: Is this htaccess tip wrong? - (folkert) - (9)
             So if this *is* on the machine the sitemap is on ... - (drook) - (8)
                 Correct! Unless... - (folkert) - (7)
                     How 'bout this? - (drook) - (6)
                         ?? - (boxley) - (5)
                             One, that's the point - (drook) - (4)
                                 Ummm... - (folkert) - (3)
                                     And just so we understand the numbers involved here... - (folkert) - (2)
                                         Okay - (drook) - (1)
                                             Sitemaps are known to be in the proper place... - (folkert)

You know the comments are going to be good.
65 ms