Webmaster Papers




Google
 
Web webhostingpapers.com




/pagead2.googlesyndication.com/pagead/show_ads.js">

Playing in Googlebots Sandbox with Slurp, Teoma, & MSNbot - Spiders Display Differing Personalities


There has been endless webmaster speculation and worry about the so-called "Google Sandbox" - the indexing time delay for new domain names - rumored to last for at least 45 days from the date of first "discovery" by Googlebot. This recognized listing delay came to be called the "Google Sandbox effect."

Ruminations on the algorithmic elements of this sandbox time delay have ranged widely since the indexing delay was first noticed in spring of 2004. Some believe it to be an issue of one single element of good search engine optimization such as linking campaigns. Link building has been the focus of most discussion, but others have focused on the possibility of size of a new site or internal linking structure or just specific time delays as most relevant algorithmic elements.

Rather than contribute to this speculation and further muddy the Sandbox, we'll be looking at a case study of a site on a new domain name, established May 11, 2005 and the specific site structure, submissions activity, external and internal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at the top four search engines.

Ready? We'll give dates and crawler action in daily lists and see how this all plays out on this single new site over time.

* May 11, 2005 Basic text on large site posted on newly purchased domain name and going live by days end. Search friendly structure implemented with text linking making full discovery of all content possible by robots. Home page updated with 10 new text content pages added daily. Submitted site at Google's "Add URL" submission page.

* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google. (Slurp is Yahoo's spider and Teoma is from Ask Jeeves) Posted link on WebSite101 to new domain at Publish101.com

* May 15 - Googlebot arrives and eagerly crawls 245 pages on new domain after looking for, but not finding the robots.txt file. Oooops! Gotta add that robots.txt file!

* May 16 - Googlebot returns for 5 more pages and stops. Slurp greedily gobbles 1480 pages and 1892 bad links! Those bad links were caused by our email masking meant to keep out bad bots. How ironic slurp likes these.

* May 17 - Slurp finds 1409 more masking links & only 209 new content pages. MSNbot visits for the first time and asks for robots.txt 75 times during the day, but leaves when it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in!

* May 23 - Teoma spider shows up for the first time and crawls 93 pages. Site gets slammed by BecomeBot, a spider that hits a page every 5 to 7 seconds and strains our resources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out.

* May 24 - MSNbot has stopped showing up for a week since finding the robots.txt file missing. Slurp is showing up every few hours looking at robots.txt and leaving again without crawling anything now that it is excluded from the email masking links. BecomeBot appears to be honoring the robots.txt exclusion but asks for that file 109 times during the day. Teoma crawls 139 more pages.

* May 25 - We realize that we need to re-allocate server resources and database design and this requires changes to URL's, which means all previously crawled pages are now bad links! Implement subdomains and wonder what now? Slurp shows up and finds thousands of new email masking links as the robots.txt was not moved to new directory structures. Spiders are getting errors pages upon new visits. Scampering to put out fires after wide-ranging changes to site, we miss this for a week. Spider action is spotty for 10 days until we fix robots.txt

* June 4 - Teoma returns and crawls 590 pages! No others.

* June 5 - Teoma returns and crawls 1902 pages! No others.

* June 6 - Teoma returns and crawls 290 pages. No others.

* June 7 - Teoma returns and crawls 471 pages. No others.

* June 8-14 Odd spider behavior, looking at robots.txt only.

* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.

* June 16 - Slurp still thirsty, gulps 1379 pages! No others.

So we'll take a break here at the 5 weeks point and take note of the very different behavior of the top crawlers. Googlebot visits once and looks at a substantial number of pages but doesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages until it is told to lay off the bad liquor, er that is links by getting robots.txt to slap slurp to its senses. MSNbot visits looking for that robots.txt and won't crawl any pages until told what NOT to do by the robots.txt file. Teoma just crawls like crazy, takes breaks, then comes back for more.

This behavior may imitate the differing personalities of the software engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and some reassurance it is doing the right thing, picks up pages slowly and carefully. Slurp has addictive personality and performs erratically on a random schedule. Googlebot takes a good long look and leaves. Who knows whether it will be back and when.

Now let's look at indexing by each engine. As of this writing on July 7, each engine also shows differing indexing behavior as well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexed in a clear aging routine that doesn't list any of the nearly 8,000 pages it has crawled to date (not all itemized above.) MSN has 187 pages indexed while crawling fewer pages than any of the others. Ask Jeeves has crawled more pages to date than any search engine, yet has not indexed a single page.

Each of the engines will show the number of pages indexed if you use the query operator "site:publish101.com" without the quotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.

The daily activity not listed in the three weeks since June 16 above has not varied dramatically, with Teoma crawling a bit more than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.

Linking campaign has been minimal with posts to discussion lists, a couple of articles and some blog activity. Looking back over this time it is apparent that a listing delay is actually quite sensible from the view of the search engines. Our site restructuring and bobbled robots.txt implementation seems to have abruptly stalled crawling but the indexing behavior of each engine displays distinctly differing policy by each major player.

The sandbox is apparently not just Google's playground, but it is certainly tiresome after nearly two months. I think I'd like to leave for home, have some lunch and take a nap now.

Back to class before we leave for the day kiddies. What did we learn today? Watch early crawler activity and be certain to implement robots.txt early and adjust often for bad bots. Oh yes, and the sandbox belongs to all search engines.

Mike Banks Valentine is a search engine optimization specialist who operates http://WebSite101.com and will continue reports of case study chronicling search indexing of http://Publish101.com

RELATED ARTICLES


Martial Arts Webmasters: Time to Optimize Your Site!
A few months ago I was looking through the search engines to see if my website was even found for certain keywords. Well it wasn't. I knew something needed to be done, because I was losing potential customers to my Martial Art and Self Defense Classes. As the Internet grows exponentially, the important of a web presence is important.
Site Maps: A Force To Be Reckoned With
Another important component of search engine optimization is the use of site maps. If you want visitors -- and search engine spiders -- to find every page on your Web site, a site map can be your biggest ally especially if you have a lot of content on your site (and if you've been reading all the advice on our site, you should know by now that the more content you have the better your chances are for top ranking).
World of Website Promotion
Website promotion is a big and ongoing process. Every person who has website should have little knowledge about various elements involved in website promotion even if he had hired a SEO. In this series of articles I had tried to give an overview of all the entities of search engine promotion.
Website Optimization, Good Overall Optimization is Key
Good overall optimization, the right keyword phrases and quality content play the key roles in the success of any web design project. Link Popularity and Google PageRank are almost secondary for the overall success of a website.
History of World / Regional Search Engines and Directories
Computers have become a way of life for people around the world. They are used to research term papers, check weather forecasts, track military progress, exchange ideas (blogs and chat) and to find the cheapest price on items etc. It is no surprise that as the computer age takes hold computer usage has increased. The number of websites that are being developed on the World Wide Web is growing at an ever increasing exponential amount. And because we live in a quick-fix society, with limited time on our hands, we need something to make surfing the web a lot easier, something that will sort out all this influx of information into a logical order.
Search Engine Marketing: Choosing Keyword Phrases
Selecting the right keyword phrases is the key to a successful search engine marketing campaign.
Achieving Better Search Engine Optimization
The search engine giants are locked in an all out power struggle to get your attention and patronage.
Why You Should NOT Submit Your Site On Search Engines
Before to answer to this question we have to know what is the difference between a search engine and directory. Here is a brief explanation.
21 Search Engine Terms Every Web Marketer Should Know Part 1
1. Search Engine - Is a database of web sites that is ranked according to the computerized criteria that the programmers decide upon called an algorithm. Various search engines determine ranking on their own different factors of importance or relevancy. For the last few years the Google search engine was the most popular search engine supplying the search results for Yahoo and to a lesser extent MSN and AOL. This all changed recently after Yahoo purchased different search engine companies and developed its own search engine. Soon MSN will enter this market with its own search engine algorithm.
Search Engine Optimization - A Beginners Guide
Getting your site listed in the top search engines, such as Google, Yahoo, or MSN is no small job. There is lots of work that needs to be done to guarantee the highest placement possible, and even more work is needed to keep your ranking for any period of time. Here are some simple tips and strategies to keep your site listed, and listed well, without spending any extra cash on pay per clicks.
Getting Honest With The Search Engines
Getting Honest With The Search Engines
How Search Engines Connect Sellers and Buyers
Maggie knows how to find what she wants. She lets her fingers do the walking ? not in the Yellow Pages, but at Google.com. She wants to learn about bread baking, and you have just written Bread Baking Made Simple, and you sell some great baking tools. The good news is the Google and other search engines exist for one simple reason: to help Maggie find your website.
5 Things to Keep an Eye on in the SEO World in 2005...
After the latest PR update at Google and MSN's beta search going live, there is one thing for certain in 2005: the world of search is in for some major changes. There has been growing speculation around the SEO world that reciprocal linking is a thing of the past. Rumors are abound that PR means less and less, if anything. Bill Gates came out of his cave to say that "Today's search is nothing" and that it won't be that way for long. There are quiet rumblings in the SEO back alleys of a new, state-of- the-art search engine currently indexing the internet. Websites are dropping off the face of the planet. And we're all left to sit here and put together the pieces. So what is in store for 2005?
Googles Good-Writing Filter
I was recently struck by the fact that the top-ranking web pages on Google are consistently much better written than the vast majority of what one reads on the web. Yet traditional SEO wisdom has little to say about good writing. Does Google, the world's wealthiest media company, really rank web pages based primarily on arcane technical criteria such as keyword density, link text, or even PageRank?
Website Ranking With an Internet Marketing Specialist
On the internet, competition is stronger than ever. There was a time where paying a few bucks to get in Yahoo was enough to generate substantial traffic but marketing websites on the internet got much more complex since. Google is now a major player in the search engine industry and any serious internet marketing specialist and seo expert knows how important it is to get a good website ranking in that popular search engine. Understanding Google's algorythm along with having good html and writing skills can often make the difference between being an amateur or a good internet marketing specialist. Although, many other aspects that we will cover here should be taken into consideration when comes the time to find the right internet marketing specialist for your website.
Breaking the Myth About Page Rank (PR)
The most difficult challenge most web designers face is getting traffic to your site. There are plenty of companies who promise to send traffic your way. Sadly, most of this traffic is not qualified. Yes, your hit counter will move higher, however, if its not qualified, you may find you have unhappy visitors to your site. Unhappy visitors will not click on your ads or purchase your products.
SEO: The Good, The Bad And The Ugly
I seem to have created quite a stir, on a particular SEO forum recently. In fact, rumor has it, at one point, my article, "Google's Trap, DMOZ's Nap, And Yahoo!'s Crap" was the hottest topic discussed on this particular forum.
Linking for Traffic not Positioning!
With more and more experts and search engine enthusiastsclaiming the right way and the wrong way to handle linkswapping, link exchanging or reciprocal linking!
Get a Number One Google Ranking With This Simple Technique
You probably do this already - complete regular searches in Google for your key phrases and see how high you rank. It's well known that the first three results are far and away the sites that get the most clicks. If you can get one of the top three results in your key terms then you will have more targeted visitors coming to your site. If you can get the first result, well that is even better. Of course all your competitors want to do the same.
Keywords are the ?KEY? to a Popular and Profitable Web Site
Keyword Research will reveal answers to 3 critical questions: