Monday, May 28, 2007

Selective Page Indexing Directives

Following up on the new Yahoo "class" directive for isolating non-content on a web page, I have come to one conclusion -- the Big 3, and others, have their own methods of explaining to them what your page is about. Each method is often-times confusing and hard to implement. Factor in the possibility that the most novice of web page creators, who probably will not have a clue, then you have chaos at it's best.



I came across an obscure page which details directives handed out by various search engines. The page is not totally encompassing, but it sheds a lot of light on how the major and minor search engines are competing for your Html input. This input is always geared for their consumption, and if not used, then you will be at a disadvantage for their indexing abilities.



In-page directives are always geared toward the SE provider that suggests them. Depending on the notoriety of the SE, they will be inserted into pages or not. Web page builders always have to keep up with the additions, which usually does not include minor or startup search sites.



The playing field is not even when it comes to directives. They who are strongest usually prevails, but at the cost of numerous forum and blog posts which generally promote the bigger of the pack. You know who I am talking about.



The robots.txt Summit

Yahoo's indexing directive (ill conceived) was born from a Summit that was supposed to deal with the robots.txt method of directing SE's to the main content of web pages. During the summit, the idea of robots.txt evolving to meet the current Internet needs was discussed. The term "evolve" went off on a tangent that I found hard to believe.



The simplicity of robots.txt has been an easy concept for virtually every webmaster. Meta declarations in the Head of the document to NOINDEX or NOFOLLOW have been adopted and used by most.



The evolvement of robots.txt into the Body of the page must be equally simple. There has to be a way of relaying "indexing" information to ALL robots. Proprietary tagging should not be allowed by any entity.



One Solution

In the beginning there was robots.txt, then the Meta Robots directive. Following in the simplicity of both, we should have a solution that is par with both concepts.



A new tag should be introduced -- <robots attr="directive">



The attribute can be discussed amongst the SE providers. My options would follow the original concept of robots.txt and the Meta alternative -- Index and Follow.



Adding onto this concept and the prevalent need to identify "actual content", the attribute of "content" would be added. Possible directives could be: Content, NoContent



Surrounding your intent with a <robot> tag will not only make it simpler for us, but make it easier for the SE's to drill down into our pages.



Also, if the original thinking of the SE's was to identify "content". Then why not say so. Why identify non-content through nefarious means?



No comments:

Post a Comment