Archive for October, 2004

As the world’s second most popular search tool, Yahoo moves a tremendous amount of traffic and is a very credible alternative to Google. Yahoo receives over 2.76 billion page views per day from hundreds of millions of unique users. It boasts over 157 million registered users enjoying mail, shopping and discussion groups and increasingly personalized search and news services. For the past two years, Yahoo, Google and MSN have been embroiled in a hard-fought battle for the loyalty of search engine users forcing all three firms into the hyper-evolution we are witnessing today. Over the next three Wednesdays we are going to examine how the Big-3 spiders work, what they look for and how to best prepare your sites for multiple visits from the bots that rank them. Today, we are starting with Yahoo’s bot, SLURP.
?
Getting Found By Slurp? ? ? ? ? ? ? ? ?
The first thing to know about Slurp is that like its better known cousin, Google-bot, Slurp “discovers” sites by following links from one site to another, reading and recording nearly everything it finds in its path. The majority of websites referenced by Yahoo were originally included in its database because they were accessed by Slurp following links from another site.
?
Yahoo suggests adding an inbound link to all pages in your site to guarantee those pages will be discovered by Slurp. They also recommend an internal sitemap linked to from the Index (or home) page of the site. To encourage Slurp to spend more time deep-crawling your content, Yahoo recommends the addition of “good authoritative links pointing into your site”, from highly reputable sources such as news sites, established business partners and other sites relevant to your business or service.
?
Manual submission of the site is only recommended if for some reason or another Slurp does not find the site on its own. This is increasingly rare however as server-logs show Slurp is one of the most active spiders out there. In other words, if a site Slurp has already indexed links to your site, Slurp will almost certainly be visiting very soon. Webmasters should not have to pay submission fees to get into Yahoo’s index since according to Yahoo’s Tim Mayer, 99% of Yahoo’s index is crawled by Slurp for free.
?
It is still important to make sure your site is ready to receive a visit from Slurp. To ensure Slurp is able to travel across your entire site, provide standard HREF text links as opposed to forms, Flash or java script navigation tools. Webmasters are encouraged to avoid tracking and communication methods that rely on using cookies across every page of the site. If you have a database driven site or a site that creates unique sessions for each user, avoid embedding session IDs in URL’s. Lastly, use 404 pages to redirect users (and spiders) to the root (index) page if a page or site URL becomes invalid. Yahoo also asks webmasters of sites with shopping carts to use robot.txt exclusions in the source of the shopping carts.
?
Where Your Site Has Been Included. Results May Vary…
Yahoo has seen enormous change over the past seven years. What started as a paid-inclusion, human edited search directory, has grown into the second largest database of indexed content. Yahoo is on the cutting edge of integrating several forms of media into their search offerings and will likely soon produce its own entertainment content like an online HBO. Yahoo is flirting with the concept of becoming an infotainment portal again but the core of its offerings remains firmly rooted in search.
?
Yahoo search results come in multiple formats including: Yahoo-Local, Yahoo-products, MY-Yahoo (personalized results), specific nation-based Yahoo’s, and the standard Yahoo.com One of Yahoo’s goals appears to be presenting individual search-users with results that best match their personal needs. For instance, Yahoo would like to present constantly updated geographic-specific references when a user searches for daily-use items such as groceries, repair-workers, real estate and other services one would normally use a telephone directory to find. Similarly, Yahoo wants to present the entire global database of references when a user searches for international news, trans-national products or vacation plans. Being certain your website gets served up for all levels of search, local, regional and global, will be important if you wish to serve a market larger than your general region or community.
?
Getting Rankings
Yahoo’s search engine ranks sites based on a formula that is very similar to the algorithms used by rivals Google and MSN. Yahoo values many of the same elements other search engines do including keyword enriched domain names, titles, meta tags, and content. Yahoo also values keywords found in the anchor text of internal links, though the effect at Yahoo is not as powerful as it is on Google.
?
According to Yahoo, well optimized pages and sites will continue to get good results across all versions of their search engine. By opening your site to Yahoo Slurp and performing well-planned optimization services across every page, a good SEO can nearly always achieve Top placements on Yahoo. The trick is in offering Slurp the information it needs to read, record and rank your site. If that information is included on each page, a set of text-based links is woven through the site to provide easy passage for Slurp, and Yahoo is told what your business is, where your business is located, and who your business serves, your site should achieve strong rankings.
?
Due to the advent of personal, local, regional and global search results, it is highly recommended to add full contact and address information on every page of a site. This information should be as precise as possible and should include street address, unit or suite number, zip or postal code, state or province, county and country, full telephone information (including area codes), and if possible, the approximate longitude and latitude of your business location. (look up your longitude and latitude at: Astro.com)
?
When writing for Slurp here are a few basic fundamentals:

  • Have a descriptive URL
  • Use keyword enriched titles on each page of the site
  • Place keyword enriched description and keywords meta tags on each page of the site
  • Use robot.txt files to keep Slurp out of your shopping cart or log in pages
  • Place keyword enriched text in the first paragraphs of your site-copy
  • Use HREF links to direct Slurp through each page of the site
  • Add a sitemap page and be certain there is a link from the index page to the sitemap
  • Be certain that geographic specific information is mentioned on each page of the site. Always have a contact page that also lists geographic specific information
  • Write a press release and send it to as many blogs, news-wires and press release sites as possible
  • Acquire strong, relevant incoming links from sites with topics similar to yours
  • Update your site frequently
  • Enjoy and value your placements

By Jim Hedger

I knew things were bad at DMOZ. But I guess I didn’t realize how bad, until I started eavesdropping on a few forums, and reading the avalanche of e-mails I received on the subject.?

When it takes up to two years to get a web site listed, there’s a serious problem. When perfectly qualified web sites are rejected for no other reason than the fact the editor considers them serious competition to his or her own site, there’s a serious problem.?

When you e-mail DMOZ about the status of your web site and don’t even receive a courtesy response to your questions, there’s a serious problem.?

When you have egotistical DMOZ editors fighting each other to have their own web sites listed, there’s a serious problem. And quite frankly, I don’t see how the mess DMOZ has created can be fixed. With an apparently endless backlog of web sites waiting to be approved, how can they possibly catch up? The answer is; they can’t.?

But this isn’t just a performance issue we’re talking about here, this is a morality issue. The very fact that it’s a matter of public record what DMOZ is doing speaks volumes about the character of many of their editors. After all, much of what I’ve written negatively about DMOZ came directly from the mouths and/or keyboards of DMOZ editors themselves. At least they claimed to be DMOZ editors. And for the life of me, I can’t imagine why anyone would want to own up to that dubious distinction, unless it were actually true.?

This is what one DMOZ editor had to say. “Since I became an editor for DMOZ a few weeks ago (albeit for a tiny category) I have seen on the DMOZ editors board that there are a lot of good volunteers there who work hard to try to keep the directory up to date and useful. Its a shame because there are also seem to be a lot of editors there who are lazy, or who have let the “power” of being an editor go to their heads. (The people who DON’T ever post on the editor message boards, or update their categories, etc.)?

I think some method to allow webmasters to check the status of their site submissions (and to know why their site gets rejected if it is something fixable, and the site is related to the category and not just a spam submission, etc) would be an excellent first step to improving the system. Unfortunately the editor management system seems to be circa 1998 … I am only guessing based on design/functionality, but I assume big changes are not coming any time soon.”? Even Google may have come to the realization that DMOZ may have finally run its course. Previously found via its own tab, the Open Directory has been demoted to the “more” page. This was Google’s explanation for the demotion. “We analyzed what people were using, and that had become less popular over time. As the web grows, directory structures get harder to use. It didn’t seem to be worth the real estate on the home page.” Ouch!?

Demoting the directory may also be a way for Google to eventually distance itself from the Open Directory Project, which powers it. The volunteer-produced directory was added back in 2000, near the height of the Open Directory’s popularity.?

Today, there are often complaints that the ODP, has not keep up with submission demands. In addition, there have been delays in getting the most current data out in a format that ODP partners such as Google can use. Ultimately, any problem with the Open Directory–which is not in Google’s control–still reflects badly on Google.?

I do have a solution to this whole DMOZ mess, if anyone wants to hear it. I say nuke the site for morbid, and put it out of its misery!

By Dean Phillips

So, what are they? Well, Black holes are at the centre of every known galaxy and are like the eye of a storm.
?
Their centre seems calm and undisturbed, but at the edges of the eye, huge forces of nature are being exerted, ripping everything it contacts to shreds. Black holes are immense gravitational wells from which nothing can escape, or at least that’s the theory, amended in part by Dr. Hawking a few weeks ago.
?
Google might like to be thought of as a ‘black hole of Internet search engines,’ consumes all the information that falls within their gravitational reach. The difference being, the information does escape and the web is not really ripped apart at the seams. Oh well, so much for that analogy.
?
But there really are holes in Google, Yahoo! and all other search engines that have nothing to do with the forces of nature. These holes have serious implications for the quality of search engine results, and therefore require the attention of your optimization efforts.
?
We shall begin the analysis with Google - The current technology leader in the search engine field. When a user visits the Google search engine and runs a search, they often enter in complete phrases. This tendency is likely to become more common as text to speech comes to reality. How Google treats these phrases demonstrates a fault within their algorithms, and a hole in the accuracy of their search results. When you include a common word in a phrase within the Google search box, it gives you the following message above the search results:
?
“For” is a very common word and was not included in your search.” [Details]
?
If you click for details, you get the following explanation:
?
“Google ignores common words and characters such as “where” and “how”, as well as certain single digits and single letters, because they tend to slow down your search without improving the results. Google will indicate if a common word has been excluded by displaying details on the results page below the search box. If a common word is essential to getting the results you want, you can include it by putting a “+” sign in front of it. (Be sure to include a space before the “+” sign.)”
?
But, here’s where Google falls down. Visit Google right now. Open up 4 windows and in each window’s search box type the following queries:
?
Hotels New York
Hotels in New York
Hotels for New York
Hotels about New York
?
The words in’ for’ and about’ all get the standard, “This is a very common word and was not included in your search,” message. Yet all four display entirely different results?
?
What is Google doing? I considered the possibility that I was pulling results from different data centres, so I ensured this was not the case. I then tried a variation on this search query, using the term “search engine optimization X hotels” the ‘X” representing a blank space, or one of the words, in’ for’ or about. In this test, only where the X’ represented a blank space did I get varying results. Still, by rights they ought to have all been identical.
?
It occurred to me that perhaps Google was using different algorithms when it identified a place name in the search query by trying to understand the context of the query. That would be a logical move. I’m very familiar with software that comprehends the context of textual content. Could it be that Google is trying to apply some contextual filtering to their results? I then proceeded to try a garbage search. A search phrase with common words which really have no direct relevance, and therefore words which would never appear together logically:
?
“Room hotel tapestry highway lagoon”
?
Interestingly, Google had 1720 entries which matched this query, and the results varied depending on which of the X terms I inserted between any two of the words. Search results also varied if I moved the placement of the ignored word within the query. But is this context? A further test would be required. I put together 3 queries using the same terms, but with a common or ignored word inserted as follows:
?
Filing tax return(s)
Filing a tax return(s)
Filing of tax return(s)
?
In this case, I tried singular and pluralized searches, to ensure that poor grammar was not affecting the results. Results varied for each search. That’s not to say they were all entirely different, just that they varied. I tried a few other searches and received similar results. Most importantly, the results I received were all equally contextually correct, which was a relief.
?
Some people have written to news groups and discussion boards that when Google comes across an ‘ignore’ word, it substitutes a wild card. However, if that were true, the various ignore words, would all return the same results and this is not the case. Therefore, it can be surmised that Google does not in fact ignore words at all! It is more likely that Google is using some measure of context algorithm. This is logical. The technology exists and Google is known to have bought a UK firm last year which was developing such a technology. Our own firm uses software which uses contextual analysis in its algorithms.
?
Taking the analysis a step further, which other engines seem to have a grasp on context? Obviously, the places to look first were Google’s competitors: Yahoo! Microsoft, and AskJeeves.
?
AskJeeves sprang immediately to mind, as it had originated the concept of “phrase a question” type searching, thus it should logically have some context filtering in place. In fact, when I ran the ‘tax return’ query through the engine, I still receives varying results. Very different results than Google, I might add. When multiple ‘ignore’ words were added to a query, results did not vary, which may indicate very limited filtering.
?
I then tried an alternate query. “diapers for baby” and “diapers on baby” This should logically return different results. One recommending diapers, and one about how to put them on, or keep them on or how they should look, etc. Surprisingly, I received identical results to my queries. Context was not being properly filtered by the very search engine which first introduced the concept! I tried the same search on Google. While results were jumbled a bit, the top web sites were the same for both queries, just in varying order. With over 550,000 results to choose from, this would indicate Google too, has a long way to go to fulfilling the promise of contextually correct responses.
?
Next, I turned my attention to Yahoo! I was somewhat surprised to discover that Yahoo! does not seem to have -any- filtering in place. Results did not vary at all for the test searches run when the “ignore” words were inserted or removed. Yahoo! also did not identify these terms as being ignore terms in their results, but the fact that results were unchanged when the terms were added or deleted would indicate that they were omitted and Yahoo! does not have the necessary algorithms to allow it to comprehend the context of a search query.
?
Is context an area where Yahoo! seriously lags behind Google and others? If true, this points to a widening gap between the search engines in the future. Google is already positioning for speech to text devices, can intonation be far behind? Yahoo! has not demonstrated any evidence of making strides in either of these areas.
?
Lastly I looked at the new Microsoft engine. No contextual filtering in place. Since this search engine is still in beta, I cannot in all fairness comment on it being behind in a race where we have not yet seen the final product. Still, it’s something to keep in mind for the future.
?
Implications for SEO
The implication of contextual search on how your web site performs in the search engines is immense. It means that the nuances of how people search have to be better taken into account by all SEO firms.
?
In our firm we recognized that as the world moved to speech to text and as the web grew in size, context would be the next big differentiator in search results. This means that context is already recognized and taken into account both by our technicians and our technology when analyzing a web site, and optimizing it for search engines.
?
Working to improve your web site’s performance in the search engines now requires a comprehension of how people are actually phrasing search queries and using that knowledge to properly position the content on your site, to account for the idioms used by your target audience.
?
Ensure that you are using phrases in the way you hear people asking questions. Ensure you cover all the bases and get all possible variations. Get outside help if you need it, but don’t miss out on your opportunity to take advantage of the Black Holes out there.

By Richard Zwicky