Thursday, June 19, 2008

Google News Search Leaps Ahead

Google News Search Leaps Ahead

Google has dramatically enhanced its news search service, serving up a portal of real-time news drawn from more than 4,000 sources worldwide.

Until recently, Google's news search has been competent, but less useful than other news-aggregating services such as AllTheWeb's News Search and Yahoo's Full coverage. The new enhancements establish Google as one of the premier news finding and filtering destinations on the web.

Like Yahoo's Full Coverage, Google News Search now looks like a portal, with links to the top headlines organized into categories such as Top Stories, World, Business, Sports and so on. Each category has its area on the News Search home page, with headlines, descriptions and links for the top two or three stories.

"The page looks very different than the average Google page," said Marissa Mayer, Google product manager. That's because it's packed with headlines, descriptions, thumbnail photos and dozens of links to the sources of the articles online.

Unlike Yahoo Full Coverage, however, Google News Search isn't assembled by human editors who select and format the news. Google's process is fully automated. News stories are chosen and the page is updated without human intervention. Google crawls news sources constantly, and uses real-time ranking algorithms to determine which stories are the most important at the moment -- in theory highlighting the sources with the "best" coverage of news events.

Each top story is presented with a headline linked directly to the source. Beneath the headline is a short description, name of the source, and the time when the article was last crawled, ranging from a few minutes to several hours ago.

Beneath the main headline and description are two full headlines from other sources, followed by four or five links to stories with only the name of the publication indicated. Finally, there are links to "related" stories from other sources.

This design makes it easy to quickly scan the headlines while having the option of reading multiple accounts of a story from different news sources -- from literally thousands of sources, for some stories.

Each major category has a link at the top of its respective section that allows you to scan news just within the category. Tabs on the upper left of each page also allow you focus in on Top Stories, World, U.S., Business, Sci/Tech, Sports, Entertainment and Health categories.

Unlike many news aggregators that simply "scrape" headlines and links from news sites, Google's news crawler indexes the full text of articles. This approach offers several unique benefits.

For example, full text indexing allows true searching, rather than just browsing of headlines. Creating a full text index of news also allows Google to cluster related news stories, around what Mayer calls a "centroid" of keywords. "A cluster is defined by a centroid of keywords, and all the articles have some of those key words in them," she said.

The process uses artificial intelligence in addition to traditional information retrieval techniques to match keywords with stories. Mayer says this approach to identifying related articles means that the relative importance of each article is "baked in," which is how the top sources for each story are selected.

Other factors used in calculating the relevance of top and related stories include how recently articles were published, and the reputation of the source. When you actually do a search, these factors are also applied in addition to keyword analysis to determine how closely particular stories match your query.

On search results pages, a link allows you to override the default ranking by relevance and order results by date -- a feature that's particularly helpful for monitoring breaking news.

Google's decision to index the full text of news sources rather than simply scraping headlines posed a major challenge for implementing the new service. The vast diversity and typically cluttered design of most online news formats is more difficult to crawl and index than many other types of web sites. "Article extraction has proven to be one of the most difficult aspects of the project," said Mayer.

Google crawls its 4,000 sources of news continuously and in real time. According to Mayer, the crawler continuously computes what's likely to change on each news source, and when the change is likely to occur. To expedite the discovery of new stories, the crawler tends to hit hub or major section pages frequently, to see what new links are there.

While the news sources are crawled constantly and individual news stories are updated continuously, the entire set of displayed stories is "auto generated" every 15 minutes. A message in the upper right corner of the main news page indicates when it was last generated.

Google's updated news search is an exceptionally powerful tool for web users. It's still in beta, so there are still a few rough edges, but all told it's one of the best news browse and search portals currently operating on the web.

Google News Search
http://news.google.com

News Search Engines
http://searchenginewatch.com/links/news.html

Wednesday, June 18, 2008

Hitwise Data Google Rules

Hitwise Data: Google Rules


Google Receives 68 Percent of U.S. Searches in May 2008

Search leader continues record growth - up 5 percent year-over-year;

Google accounted for 87 percent of searches in UK

NEW YORK, NY – June 10, 2008 – Google accounted for 68.29 percent of all U.S. searches in the four weeks ending May 31, 2008, Hitwise announced today. Yahoo! Search, MSN Search and Ask.com each received 19.95, 5.89 and 4.23 percent respectively. The remaining 41 search engines in the Hitwise Search Engine Analysis Tool accounted for 1.63 percent of U.S. searches.
Percentage of U.S. Searches Among Leading Search Engine Providers
Domain May-08 Apr-08 May-07
www.google.com 68.29% 67.90% 65.13%
search.yahoo.com 19.95% 20.28% 20.89%
search.msn.com 5.89% * 6.26% * 7.61% *
www.ask.com 4.23% 4.17% 3.92%

Note: Data is based on four week rolling periods (ending 5/31/ 2007, 4/26/08, 5/26/2007 from the Hitwise sample of 10 million U.S. Internet users. * - includes executed searches on Live.com and MSN Search but does not include searches on Club.Live.com.

Source Hitwise

In the U.K. market, Google search properties (Google.co.uk and Google.com) accounted for 87 percent of all UK searches in May 2008 representing a 12 percent increase compared to May 2007. Yahoo! search properties accounted for 4.09 percent of UK searches in May 2008, a 2 percent increase compared to April 2008. MSN search properties accounted for 3.72 percent and Ask search properties accounted for 3.07 percent of searches. MSN increased two percent compared to April 2008 and Ask increased 6 percent.

Percentage of U.K. Searches Among Leading Search Engine Providers
Domain May-08 Apr.-08 May-07
Google Properties 87.30% 87.69% 78.28%
Yahoo! Properties 4.09% 4.01% 8.58%
Microsoft Properties 3.72% 3.65% 5.46%
Ask Properties 3.07% 2.89% 4.96%

Note: Data is based on UK Internet usage over the four week rolling periods (ending 5/31/ 2007, 4/26/08, 5/26/2007) from the Hitwise sample of 8.4 million UK Internet users. Note that the percentages for the search properties include the .uk and .com domains.

Source: Hitwise UK

Google an Increasing Source of Traffic to Key U.S. Industries
Search engines continue to be the primary way Internet users navigate to key industry categories. Comparing May 2008 to May 2007, the Travel, News and Media, Entertainment, Business and Finance, Sports, Online Video and Social Networking categories showed double digit increases in their share of traffic coming directly from search engines.

U.S. Category Upstream Traffic from Search Engines and Google - May 2008
Category Percent of Category Traffic from Search Engines, May-08 Percentage Change in Share of Traffic From, Search Engines, May-08 - May-07 Percent of Category Traffic from Google, May-08 Percent Change in Share of Traffic From Google, May-08 - May-07
Health and Medical 45.76% 3% 30.86% 5%
Travel 34.81% 11% 24.26% 21%
Shopping and Classifieds 25.48% 2% 16.84% 8%
News and Media 21.70% 7% 14.53% 10%
Entertainment 24.33% 17% 15.76% 22%
Business and Finance 18.15% 14% 11.73% 22%
Sports 13.09% 17% 8.81% 24%
Online Video* 29.94% 37% 20.78% 52%
Social Networking* 16.50% 18% 9.98% 21%

All figures are based on U.S. data from the Hitwise sample of 10 million Internet users.
* denotes custom category

Source: Hitwise

About Hitwise
Hitwise is the leading online competitive intelligence service. Only Hitwise provides its 1,400 clients around the world with daily insights on how their customers interact with a broad range of competitive websites, and how their competitors use different tactics to attract online customers.

Since 1997, Hitwise has pioneered a unique, network-based approach to Internet measurement. Through relationships with ISPs around the world, Hitwise’s patented methodology anonymously captures the online usage, search and conversion behavior of 25 million Internet users. This unprecedented volume of Internet usage data is seamlessly integrated into an easy to use, web-based service, designed to help marketers better plan, implement and report on a range of online marketing programs.

Monday, June 16, 2008

SEO activities on your website

SEO activities on your website

1. Analysis & Research

* Keyword/Phrase analysis
* Keyword research using word tracker, Overture and Googlesets
* Competitive analysis
* Extensive Competitive Analysis for better search engine ranking performance
* Initial position analysis report
* Website Usability Analysis by Usability and copyediting expert
* Extensive Log file analysis
* Personalized Report Analysis and Monitoring

2. On site ( On Page) optimization

* Homepage Optimization
* Meta tags placement
* Content fixing
* Monthly Manual Update to Optimized Content
* Fixing the text links
* Optimized Navigational Structure
* Site map for better crawling of your site
* Descriptive site map creation
* Link resource page creation
* Link exchange page creation
* Image Optimization
* SEO Copywriting
* Spell Checking
* HTML Validation Checking
* Browser Compatibility checking
* Website Load time checking
* Creation of robots file
* MOD_Rewrite / URL rewrite for Dynamic sites for better search engine crawling

3. Off site (Off Page) optimization

* Manual submission to all major search engines
* Semi-automatic submission to more than 200 Search Engines
* Resubmission of sites to certain search engines if necessary
* Submission to important paid inclusion directories
* Yahoo Directory Inclusion
* Submission to Dmoz directory
* Submission to more than 5000 free inclusion quality directories
* Re-optimization of site
* Reciprocal link building - 2way and 3way links
* Link Popularity through one way links
* Buying text link advertisements from relevant sites to increase the link popularity
* Article Submission
* Newsletter Submission
* Forum Posting
* Blog submission

4. Reports

* Monthly management plan
* Detailed Ranking Report
* Weekly updates & comprehensive Monthly Ranking Reports
* Submission Management with Reports
* Site Visibility Statistics Report
* Server Check and Link Check

5. Support

* Search Engine Algorithm Updates
* 100% Guaranteed Uptime during site Modifications
* Multiple CD Burned backups of Optimized pages and site
* Technical Support
* 24/7 Phone support and online support

Friday, June 13, 2008

Change Definition for Doorway Pages

Google Change Definition for Doorway Pages

According to Search Engine Watch, Google has changed the way it defines 'Doorway Pages'.

The new definition at Google Webmaster Help Center for 'Doorway Pages':

Doorway pages are typically large sets of poor-quality pages where each page is optimized for a specific keyword or phrase. In many cases, doorway pages are written to rank for a particular phrase and then funnel users to a single destination.

Whether deployed across many domains or established within one domain, doorway pages tend to frustrate users, and are in violation of our Webmaster guidelines.

However, the cached version of the same still shows the old version:

Doorway pages are pages specifically made for search engines. Doorway pages contain many links - often several hundred - that are of little to no use to the visitor, and do not contain valuable content. HTML sitemaps are a valuable resource for your visitors, but ensure that these pages of links are easy for your visitors to navigate. If you have a number of links to include, consider organizing them into categories or into multiple pages. But in doing so, ensure that they are intended for visitors to navigate the sections of your site, and not simply for search engines.

In the new version of the definition, key sentences, words and adjectives have been changed and replaced by more generic terms. Discussions are on at the Search Engine Watch Forum. It seems that Google has tweaked the definition in order to make the look of the page more subtle than technical.

Google IP Delievery Geo Location

Google IP Delievery Geo Location cloaking


At the Google Webmaster Central Blog, Google has released some valuable information about webserving techniques, especially related to Googlebot. This post has been written keeping in mind the numerous information requests that Google had received for IP Delivery, Geo Location and Cloaking techniques.

Geolocation: It is the process of serving targeted or different content to users on the basis of their locations. Webmasters have the tools to determine a user's location from preferences stored in their cookies. This information is related to the user's login or their IP address. Such as, if your website is about theater, then you can always use geolocation techniques to highlight Broadway for a user in New York.


IP Delivery: It is the process of serving targeted or different content to users on the basis of their IP address. IP addresses are meant to provide geographic information. IP delivery is quite similar to geolocation, therefore, the techniques are almost the same.


Cloaking: It is the method (unethical though) of serving different content to users than to Googlebot. However, this step is considered to be unethical and Google Webmaster Guidelines prohibit Webmasters from using it. If the file that Googlebot crawls is different from the file served to the user then a Webmaster is coined as being in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.

First Click Free: If the Webmasters follow Google First Click Free Policy. Then they would be able to include their premium or subscription-based content in Google's websearch index without violating Google's quality guidelines. Webmasters can allow all users who find their page using Google search to see the full text of the document, even if they have not registered or subscribed. The user's first click to the content area is free. But, if the user jumps to another section of the website, then a Webmaster can block the user's access to the premium or subscribed content with a login or a payment request.

There is also a thread at the Webmaster Help Group, that would be quite interesting for all the Webmasters out there.

Robot Exclusion Protocol

Google, Yahoo! & Microsoft Talk About 'Robot Exclusion Protocol'!

Google Webmaster Central Blog, Yahoo Search Blog and the Microsoft Live Search Webmaster Center Blog have come out with quite informative documentation about Robot Exclusion Protocol, Last year in February, I had put up a post informing our readers about Google's thoughts on the Robots Exclusion Protocol. All three have come out with REP features documentation at the same time. This makes it mighty easier for users to know about the techniques employed by all three Search Engines for the Robot Exclusion Protocol.

Here is what all three Blogs are saying in Unison:

Google Webmaster Blog:

For the last couple of years Google, Yahoo! and Microsoft have been collaborating to bring essential Webmaster Tools. The REP features employed by all three search engines are applicable for all crawlers or for specific crawlers by targeting them to specific user-agents, which is how any crawler identifies itself. The following are the major REP features currently in use by all three search engines.

For Robots.txt Directives

Disallow.

Allow.

Wildcard Support.

Sitemaps Location.



For Sitemaps Directives.

NOINDEX META Tag.

NOFOLLOW META Tag.

NOSNIPPET META Tag.

NOARCHIVE META Tag.

NOODP META Tag.



Yahoo! Search Blog:


Yahoo! Follows the same REP feature as used by Google and mentioned above. However, there are some Yahoo! Specific REP directives that are neither supported by Google, nor by Microsoft. These features are:

Crawl-Delay: Allows a site to delay the frequency with which a crawler checks for new content.

NOYDIR META Tag: This is similar to the NOODP META Tag above but applies to the Yahoo! Directory, instead of the Open Directory Project.

Robots-nocontent Tag: Allows you to identify the main content of your page so that the Yahoo! crawler targets the right pages on your site for specific search queries by marking out non content parts of your page.



Microsoft Live Search Webmaster Center Blog:


Even Microsoft follows the same REP Directive as Yahoo! And Google. However, as with Yahoo, Microsoft too has a dedicated REP feature that works with Microsoft and Yahoo, but not with Google.

Crawl-Delay: Allows a site to delay the frequency with which a crawler checks for new content.



Over at Matt Cutts Blog, he also mentions the similar REP directives used by Google, Microsoft and Yahoo!. However, he has also written about some other informative online documents that Google has published over the past few weeks. Some of the really interesting posts by Google so far have been.

IP delivery/geolocation/cloaking: In this post, Google explains with the help of a video, their webserving techniques related to Googlebot. This post is all about IP Delivery, Geo-location and Cloaking techniques.

Doorway Pages: Google has recently changed the definition for Doorway Pages at the Google Webmaster Help Center. This post provides the old and the new definition for the user to compare and understand the difference between the two.



This collaborative revelation is all about providing a clear picture to the Webmasters about the actual REP functionalities. Keeping track of techniques for different Search Engines is very arduous task and hence, Yahoo!, Microsoft and Google have provided a consolidated overview of the actual similarities and differences between the implementation of REP features by these three major search engines.

Google SERP Dancing Possible June 2008

Google SERP Dancing Possible June 2008

According to Webmasters World Google SERPs (Search Engine Result Pages) are returning fewer results for specific queries, pointing to possible Google SERPs update. The speculations for such changes are that, it may be due to the quality control practices employed by Google, or it can also be a human-error.

Another concern is cache related, where the cache date is current but the cache pages are about one to three months old. In some cases, cache pages aren't being displayed at all. Such as in case of google.co.uk, users are experiencing ranking changes (increased ranks), that is being attributed to the quantity of links rather than quality of links. Some reports suggest that a lot of irrelevant information is being displayed in the first page of Google SERP, information that is in no way related to search query.

Let us see what the Webmasters at the 'Webmaster World' have to say about this possible update:

“I've wondered too as to why some search terms are affected more than others and some result pages are changing around while other barely move.

Has anyone seen any relationship between how popular a search the term is and how much movement is going on?

As to when it will end I don't think we can predict as nothing quite like this has gone on before”

“I don’t know if anybody has reported this or they could be doing some major testing in my areas.

I’m seeing some dramatic across the board cuts for returned results for many keywords in my areas. Many keywords that once returned 850-950 results are now showing only 600-725 returned results. First page though is showing about the same amount of returned results as before which is somewhat deceptive. I had a feeling this was right around the corner. They’re applying more and more of the quality control features of Adwords to the natural results.”

“I see this pattern too. It might be the result of the "human editorial army" as well as the automated quality measures.“

“Has anybody seen where cache dates may be 1-3 months old but the page showing in the cache is current? This is taking into account that the new cache date could show up shortly but doesn't.”

“ I'm noticing the rapid rise of a few sites in the google.co.uk serps. On investigation using Yahoo site-explorer it looks like shear volume of backlinks of any quality trumps a lower number of quality links. Whoever has been playing with the UK geo filter recently seems to have turned off the "high quality" bit of the algorithm. Thus creating a field day for webmasters who exploit the low pay rates of 3rd World SEOs.

Since it is generally agreed that being linked to (except in certain extreme situations) cannot harm your site should we all be paying someone $200 to get 400 links from dodgy directories?”

Well, these unexpected changes definitely have unmistakable similarities with SERP updates. However, as of now, it would be wise to just wait for Google's response.

Google SERP Updates June 2008

Google SERP Updates June 2008

This month, WebmasterWorld are reporting that there are fewer results being returned for specific queries. The suspicion lies in quality control practices that are now being implemented, but others believe that the "human editorial army" may be the reason behind this.

Other issues reported include cache problems. The cache date is current, but the cached page is 1-3 months old. Additionally, some say that pages that are cached are not even showing up in search results (even with the site: operator).

In Google.co.uk, it looks like there are some increased rankings and it's a result of the quantity of links (rather than quality). They also say that a lot of junk is showing up on the first page that are not even related to the queries being made. There are a number of UK members chiming in that the SERPs there are "crazy" and irrelevant. One #1 ranking even returns a 404. Obviously, something seems fishy over there.

Monday, June 2, 2008

Google Updates Sitemaps

Google Updates Sitemaps

The webmaster-friendly project started by Google over the summer has its own blog and some new features available for its users.
Google Sitemaps makes a tool available that lets site publishers create a map Google's spiders can use to more effectively index its content. On the official Google Blog, Grace Kwak posted about some new features in the Sitemaps service.

In our latest release, we provide even more interesting statistics that webmasters can use to improve the way their pages work with web crawlers, which will ultimately benefit their visitors.
I think the most fun are the new "query stats" -- they show top Google search queries that return pages from a site, as well as the top search queries that led users to click on a site. We've also enhanced the crawl errors we show, like specific HTTP errors Google runs into when crawling a page.

Google posted some more details about the new stats available on the Sitemaps blog:

With query stats, we show you the top Google search queries that return pages to your site as well as the top queries that caused users to click on your site in the search results.

With crawl stats, you can see how we view crawled pages. You can see a distribution of the pages successfully crawled and the pages with errors as well as a distribution of PageRank for the pages in your site.

Page analysis shows you what we detect about the content and encoding of your pages.

Index stats provide an easy way for you to use our advanced search operators to return results about how we see the indexed pages of your site.

Mobile stats
You can now verify your mobile sites and see stats for them.

More detailed errors
Now you'll have more details about problems we had crawling your site. We report on 40 different types of errors in 5 categories.