Wednesday, April 30, 2008

Internet Search Engine Update

AlltheWeb has been busy, adding full Boolean searching, redesigning its site, and looking towards a new owner. It added full Boolean searching capability on the advanced search page using AND, OR, and ANDNOT. These operators, and nesting with parentheses, should only be used in the Boolean box on the advanced search page or on the simple search page if the search type menu box has been added via the customization option and the Boolean search type has been selected. It also has a RANK operator that is supposed to boost the rank of results containing that term, but it does not behave dependably.

AlltheWeb tries to automatically identify appropriate language limits for users, but it still retains an "Any Language" option, and the default language limit can be changed on the preferences pages. It has also introduced a variety of quick links, bookmark shortcuts, and search options for various Web browsers that make it easy to search AlltheWeb directly from the address box, by highlighting a term on a Web page and then clicking a bookmark, and other shortcuts. These are available under Help and Search Tools.

The AlltheWeb redesign banished banner ads, provides more readable results, uses new colors, and has added a URL Investigator. Enter a URL as a search term, and the results page can include page language, size, last update date, number of pages that link to the URL, number of pages that contain the term, number of pages at the site, subdomains at the site, Open Directory categories containing the site, and links to Easywhois and the Wayback Machine for the URL. With the redesign, a few features such as the document directory depth limit, the home page limit, and FAST topics were added. Also, the related searches and multimedia results have been moved from the right margin to the bottom.

Lastly, AlltheWeb and the rest of the FAST Web Search Unit (but not FAST's enterprise search) are being acquired by Overture with expected completion in April 2003.

AltaVista joins AlltheWeb in getting bought out, and by the same company, Overture. The deal is also expected to close in April 2003. Overture is buying the whole AltaVista company, including its search-related patents and its enterprise search engine. In terms of search features, one recent change at AltaVista is that the wild card or truncation symbol (the asterisk [*]) is now simpler to use. It used to only represent 0-5 extra characters and a double asterisk (**) had to be used for unlimited truncation, but now a single * represents an unlimited number of characters. AltaVista is the only major search engine that now offers truncation. It can be used at the end of terms and internally if after at least three characters.

Go, the former Infoseek, had long since given up having its own database and search engine. Instead, it just provided Overture search results with the ranked advertisements above Inktomi results. Recently it switched from Overture to Google, still with Google-provided ranked advertisements above regular Google search results. In addition, Disney, Go's owner, seems to be looking into the possibility of selling the Infoseek patents and technology that used to power Go.com.

Google has been active the past few months, but not with its usual search activities. Instead, it purchased a Web log company, Pyra Labs, the maker of the popular blog tool Blogger and the Blog*Spot blog hosting site. The other major initiative at Google is the introduction of yet another advertising program. Google Content-Targeted Advertising expands the reach of the advertising beyond search engine results (what It has offered for a while now—the text ads at the top and in the right-hand margin that are labeled as "sponsored links") to placing those ads on non-search related pages on other Web sites. These text ads are starting to appear on content sites such as HowStuffWorks, Knight Ridder Digital, Weather Underground, and Google Groups.

HotBot made a few changes to its new interface. It has added more advanced search features to its Teoma advanced search page: language, region, and date limits. However, the Inktomi and AlltheWeb advanced search pages have lost their directory depth limits.

MSN Search relaunched with less clutter and no banner ads. It now includes indexed PDF and Microsoft Office files, and the advanced search now has limits for HTML, PDF, Word, PowerPoint, and Excel documents. The Basic Search continues to display LookSmart directory results first followed by Inktomi results, while the Advanced Search goes straight to the Inktomi database.

Northern Light is almost completely dead, but it has still been sputtering along. The news search stops updating with new content and then starts again. The Web database at NLResearch.com has been up and down as well. The Special Collection is usually no longer searchable, and it looks likely that the whole system will be nonfunctional soon.

Yahoo! completed its acquisition of Inktomi. Yahoo! search results are still from Google at the time of this column, but many expect to see Inktomi results showing up soon on Yahoo! searches.

Tuesday, April 22, 2008


While Google is experimenting on crawling hidden web pages through HTML forms indexing, Yahoo on the other hand has updated its search crawler with Slurp 3.0. Although the implementation of Slurp 3.0 would not really pose a big implication on webmaster’s part, just the same here are the changes that Slurp 3.0 will bring in the way it will crawl websites.

First, Slurp 3.0 will start crawling from smaller set of IP addresses, although still within crawl.yahoo.net.domain. Reverse DNS checks will still continue working. For webmasters who use IP-based recognition for identifying Yahoo crawlers, Yahoo advises to move to reverse DNS-based identification of Yahoo! Slurp to avoid getting dropped by the Yahoo Slurp 3.0 crawlers.

Second, Yahoo! Slurp 3.0 will now publish a new user-agent – “Yahoo!Slurp 3.0”. Although existing robots.txt directives for “Slurp” or “Yahoo! Slurp” will continue working, directives for “Slurp 2.0” would not work anymore. So, Yahoo suggests that webmasters use the shorter version of the User-agent which is simply – Slurp.