IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New And just so we understand the numbers involved here...
I just went through the last year's worth of logs for all the domains we support and my personal ones.

nearly 68 million hits for case insensitive "favicon"

Of those non-ico hits are about 95,000. And about 99% were for "favicon.gif"

They all have come from 13 networks with "Filtered RIPE" which means they are privately operated and typically are "research" related. Additionally, most all of the requests *DO NOT* include a User Agent that is considered known. Which means its a machine scrubbing through.

Many of the "ico" ones have UAs of:

Mozilla/5.0 (compatible; YandexFavicons/1.0; +http://yandex.com/bots)

Mozilla/5.0 (Really Gmane.org's favicon grabber)


Or are similar scrapers.
New Okay
So for a favicon request that's not .ico and in root, I shouldn't return anything. Is that also true of requests for sitemap.xml in the wrong place?

BTW these tips weren't for SEO benefit. These were presented as tuning your webserver for better performance ... although they might have been offered by a guy more into SEO than into sysadmining.
--

Drew
New Sitemaps are known to be in the proper place...
Except for some CMS or Blogging machines that are "shared"...

Only if you are on something like "www.blogthis.info/yourblog" the sitemap for your "sub site" would be "www.blogthis.info/yourblog/sitemap.xml"

If they request a sitemap somewhere other than proper placement, they are either fishing for badly configured sites, or deliberately trying to probe you for what exactly you have setup and where things are...

Typically, this works on Window IIS server because of all the automagic stuff they do, but in all honesty, do you really want to support "Windows" nomenclature and functions on non-IIS servers?

In all honesty, you can add in all the "corrections for errors" you want, you are just feeding evidence for this stuff by rote and not with good reasoning.

If you really want to support this kind of stuff, you should look into the "mod spelling" module for Apache and get into using that properly. It'll help people much more than these 5000 rules from an SEO will. But mod_speling will definitely have an impact on performance if you get scanned/scraped with bad stuff trying to find your "myphpadmin" (just an example) pages.

Its a juggle, all of these "protections" from the user getting lost sometimes make managing things just not worth the time and effort. You'd think with all these really new and fully featured Web Browsers existing now a days, this stuff would just go away. But again, advice from 12 years ago on how to push all the wonky stuff that existed back then (including the drive away from ".gif") to a standard today...

Think about how many people still use the browsers from 1999... not many, including people still on Windows 95/98/ME and WindowsNT4 and Windows2000. As is the case, in order to even get around the internet now... you have to have a browser that support many newer standards that weren't even thought of yet in 1999.

Take my opinion with a Grain of salt and let it simmer a bit. Then do what you want.
     Is this htaccess tip wrong? - (drook) - (16)
         working as designed -NT - (boxley) - (5)
             But I *want* people to see the sitemap - (drook) - (4)
                 no, they are deliberately doing an open loop - (boxley) - (3)
                     Why would you do that? -NT - (drook) - (2)
                         Looks like a honeypot for bad bots. -NT - (static) - (1)
                             But how do the *good* bots get it? -NT - (drook)
         Re: Is this htaccess tip wrong? - (folkert) - (9)
             So if this *is* on the machine the sitemap is on ... - (drook) - (8)
                 Correct! Unless... - (folkert) - (7)
                     How 'bout this? - (drook) - (6)
                         ?? - (boxley) - (5)
                             One, that's the point - (drook) - (4)
                                 Ummm... - (folkert) - (3)
                                     And just so we understand the numbers involved here... - (folkert) - (2)
                                         Okay - (drook) - (1)
                                             Sitemaps are known to be in the proper place... - (folkert)

Stop thinking in all caps.
131 ms