IWETHEY v. 0.3.0 | TODO
1,095 registered users | 1 active user | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Arachnophobia
Had a few interesting eye openers over the past few days on how to be friendly to spiders...and why you would want this.

Spiders, of course, refers to search engine 'bots. Interesting that many of the same things that drive me nuts are also pretty hostile to search engines and their metrics. Functionally, Google is a blind man surfing the web. A multi-billionaire blind man with millions of friends who hang on his every word.

So, I present a few arachnophobia awards. Sites, designs, and practices that are remarkably toxic to spiders. The result is sites with a remarkably low Google profile.

First place in my book goes to [link|http://www.trilogyit.com/|TrilogyIT]. The home page is simply a flash animation. It looks nice enough when viewed...with a flash-enabled browser. For those of us who don't have, or don't enable, flash, there's simply nothing there. Google makes [link|http://www.google.com/search?q=site%3Atrilogyit.com+job|no reference] to any content at the site, and only finds three [link|http://www.google.com/search?hl=en&q=link%3Awww.trilogyit.com|links to it]. Unsurprisingly, I had zero results from the company in my recent job search, despite several years working with one of the key members of the company in a prior life.

Another practice that's highly toxic to Google searches is [link|http://www.amazon.com/exec/obidos/tg/browse/-/523851/ref%3Dwg%F5rb%Fb5o/002-0718869-4578410
|session IDs]. They break two ways. One is that no two sweeps through a site are the same, the second is that mangling the session string may break the URL (as in this case with Amazon -- the page linked originally was simply the default entry page). Session IDs are among several web tricks which result in sites that can produce a virtually unlimited number of pages. Because of the risk of turning these into spider traps (the spider enters and never leaves), many spiders simply avoid such sites.

[link|http://judiciary.senate.gov/beta/|Text as images] is another classic faux pas. Google doesn't OCR (though it does [link|http://www.google.com/search?as_q=file&num=20&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=pdf&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=&safe=off|handle PDFs] and other formats), so any keywords presented as graphics might as well be screen noise.

My thought is that, with all the noise about standards and accessability, it's going to be search engines ultimately which drive these issues. Accessible, standards-compliant websites will be more valuable than any chrome -- Flash, animated gifs, sound, Java, or Javascript. What might make things particularly interesting is if Google stated a clear preference for W3C conformant HTML and XHTML -- after all, it makes their spidering process easier. Might improve things around here....
--
Karsten M. Self [link|mailto:kmself@ix.netcom.com|kmself@ix.netcom.com]
[link|http://kmself.ix.netcom.com/|[link|http://kmself.ix.netcom.com/|http://kmself.ix.netcom.com/]]
What part of "gestalt" don't you understand?
New One gripe I have with that
The problem you mention with session IDs. I understand that, as commonly used, they can be spider traps, but they are also the only reliable way to do personalization without relying on cookies, java or javascript.

Okay, that's not true. You can custom-build each URL, but that's even worse for search engines than SIDs. And there's HTML authentication, but ... hmmm, what was wrong with HTML authentication again?
We have to fight the terrorists as if there were no rules and preserve our open society as if there were no terrorists. -- [link|http://www.nytimes.com/2001/04/05/opinion/BIO-FRIEDMAN.html|Thomas Friedman]
New HTML authentication
by which I think you mean Basic Authentication, more properly HTTP authentication. From a funcationality point of view the only thing wrong with it is that the userid:password are simply base 64 encoded which makes them easy to capture. That's easy to fix if you're willing to pay the overhead for an SSL server. What makes it undesirable is the bad UI. Companies go to great lengths and spend large sums to make their sites look and behave a certain way. Basic Authentication pops an ugly dialog box over which they have no control. That and it breaks the flow of the dialog makes it a non-starter. The advantage SIDs and friends have is that they don't require the user to do anything--it's automatic. Even better, they don't require any knowledge of the user on the server-side so everyone is tracked equally. Because it is automatic and covers everyone, it doesn't break the UI that the company worked so hard to put together. Authentication is only needed for authorization or site personalization. For all their advantages, SIDs do have their downside too, as has been noted.
Have fun,
Carl Forde
New Heh. Side note
In the last two weeks I've dropped a couple of grand based on Web advertising.

All of it went to companies whose sites don't use splash screens or Flash.

That trend will continue.
Regards,
Ric
     Arachnophobia - (kmself) - (3)
         One gripe I have with that - (drewk) - (1)
             HTML authentication - (cforde)
         Heh. Side note - (Ric Locke)

It doesn’t get tagged as pathological, even if using it means you ignore actual people.
60 ms