Friday, October 29, 2004

Google reach the 6 billion

After almost reach $200 in trade today, the other index of Google the "the" search pass the cap of 6 billion:

Tuesday, October 26, 2004

MSN future look ?

From the Serch Engine Watch Blog here some screen shots of the next MSN look ?

In this one you can see a PC tab and the Search Builder Box (a advanced search) open! The MSN Desktop out soon ?

In this one with a click on Result Ranking you will see appear three sliders like some used at Yahoo shopping.

Google for president

Too funny paper by the Standford Daily Online Edition!

Sunday, October 24, 2004

MapReduce : The programming model who give us the Florida update ?

A new report (pdf) on MapReduce, the new programming model use by Google to processing and generating large data sets, his now at Google Labs, I found thoses :
We wrote the first version of the MapReduce library in February of 2003, and made significant enhancements to it in August of 2003, including the locality optimization, dynamic load balancing of task execution across worker machines, etc.

The update of the Google index following thoses was in mid-november and was the most talk about update. Coincidence ?

Friday, October 22, 2004

Fidelity : A $325,2 millions in less than 2 month with Google stock?

Remember this post (but with a broken link inside now) the 10th of september ?

Here a new link with the same story here:
Google Stock Gobbled by Fidelity Investments

Fidelity Investments, the world’s largest mutual fund manager, bought $549 million of stock in search engine Google’s IPO, about 23 percent of the shares offered during the initial public offering. Fidelity reported in a filing with the U.S. Securities and Exchange Commission that it now holds 5.21 million Google Class A shares. That’s about 16 percent of Class A stock and 1.9 percent of Google’s total shares outstanding.

The Boston Herald reports

“We don’t comment on buying or selling of individual securities, nor on SEC filings,'’ said a Fidelity spokesman yesterday. Fidelity’s fund managers tend to be shrewd, but on balance conservative, stock pickers. “Fidelity has a very good research department, and a particularly good group of technology analysts,'’ adds John Bonannzio, commentator at Wellesley-based research firm Fidelity Insight.”

The Herald adds that Fidelity could be sitting on a $100 million dollar profit in a month - wow.

This last line is the best! Not $100 but normally at least 325,2 million dollar profit in around 40 days if you look at thoses:

5.21 million actions (buy in august and september) x $110* = $573,1 million
5.21 million x $172.43** = $898,3 millions

* Higher price reach since the IPO at the moment of the news release of the buy)
** Price at closing today
Difference : $325,2 million at least in less than 2 month!!!

Thursday, October 21, 2004

Google : Since 1999 a 437,115 percent increase

From the Deloitte Technology Fast 500 (pdf) :

Is it enough to measure success by revenues alone? How about when your company name actually becomes a hip verb, when your site is used for everything from getting the lowdown on possible love interests to finding critical medical information? Or when your IPO becomes the cause célèbre of the business world?

Google is an Internet phenom if ever there was one. But unlike so many earlier phenoms that tanked, Google has revenues. Indeed, they posted $220,000 in revenues in 1999 and $961,874,000 in 2003, an increase over five years of 437,115 percent.
A play on the word googol, which refers to the numeral 1 followed by 100 zeros, Google is now an unlikely household name. The search engine company first opened its doors in 1998, and while in beta mode, answered 10,000 search queries a day. Word got out and in one year, the service was answering more than half a million queries daily and had Red Hat as its first commercial search customer. Oh, and they tossed the beta label, as well.

Wednesday, October 20, 2004

Yahoo slowly put its tentacles in China

A few others parternships with China for Yahoo! When you look how internet connections already reach 6 millions users in the only town of Beijing, you see the huge market there !!!

Tuesday, October 19, 2004

Gates: PC will replace TV, TV will become a giant Google

A not so bad article, and at first thought make me laugh at the first paragraph :
Microsoft founder and chairman Bill Gates must see Google everywhere he looks these days. He must even see Google when he closes his eyes, and enters that lucid dreaming state from which all of Microsoft's great strategies eventually emerge. What he sees at that moment, we imagine, is a Tellytubby landscape that looks a lot like the Windows XP default wallpaper - perhaps with Chairman Bill himself as the sun. But bouncing across this happy vista are the red, green and blue colored balls that have rolled out of the Google playpen.
But after reading the others one, for one of the first time I agree with the comment of a writer of the Reg.

Meanwhile, and this is even more astonishing, digital TV has become an object of widespread derision in the UK. For the first time in its history, the word "digital" has negative brand connotations. Such is the pushback against glitchy digital TV streams, full of drop outs and hiccups, and hard-to-use controls, that people are beginning to clamor for the analog signal to remain on. "Digital" now means "crap", which should give lazy marketeers some pause for thought. TV is becoming associated with the kinds of problems people associated with PCs. It's true a few programme formats lend themselves naturally to some form of interactivity: particularly live TV which invites vox pop polls or comments. But these gimmicks actually get in the way of programming with a conventional narrative pull: such as a movie, a drama or a footie match.

That one make me remember the party I got with friends, not seen for a while, a few day ago. They were around 28 to 45 years olds, and the majority are just new (around 1 to 3 years) with "new" technology. I was shock to see that party focusing on a joystick :

They were completly addick to a Xbox boost with a 200 gig drive and full of MP3, video clips and TV shows record on it with a easy to use GUI program to handle it. Almost 75% of the peoples who was there already have DVD and use and computers at work, but were totally obsessed by the Xbox hack to TV where you can have the total control with a Joystick, and make me a bit sick while the TV and Xbox take the control of the party ;-)

Warning for French trademark keywords use in Overture

The french keywords suggestion's page of Overture now show this warning about registered trademark :

Remarque 2: Vous pouvez vérifier si un mot clé correspond à une marque déposée en vous rendant sur le site de l'INPI* (

Translation : You can verify if your keyword match a registered trademark by going to INPI* site.

*National Institute of Intellectual Properties

Hope they will make some distinction between other country using french language and France ! Will Google Adwords will make the same warning after their French trials ?

Microsoft Catalog Index

A unknown Microsoft (2000 and up) service is the Indexing Catalog. Not very easy to setup, it build a index like GDS and CDS before you can use it. After that it will show you that form to perform your query :

Well, its fast, and that's it. In a test query I made on MCI versus a normal (slow) search with the research tool provide by Microsoft it found only 8 of the 19 files I got. The MCI missed 7 Word documents and 4 Word backup documents. But it is the only tools I saw for now, who perform also searches in CSS.

The lack of a nice GUI interface, the number of clicks to reach it and a glitch in the Help screen Advanced Query Syntax give me a very bad first impression.

Thanks to Dodgers Webstractions.

I will comeback on this in a Desktop Search tools comparaison soon !

Monday, October 18, 2004

Power searchers equal power buyers

Good news for SEO and SEM industry. In the Web 2 Conference the presentation (powerpoint) made by Gian M. Fulgoni of ComScore reveal that power users of search engines are clearly better clients than light or non-searchers.

Here two of the slides of that presentation:

Also notice the huge increases in some categories of goods :

Google IM protocol already embed in GDS

It seem that some new protocol "google_im://" are already inside the code of GDS. I was not so far !
Also the lack of finding other file may be explained by the use of Microsoft "index.dat" file by GDS.

Friday, October 15, 2004

GDS + Firefox + Slogger even index what Google can't see

I just test the Slogger Firefox extension who save in Html every kind (php, Websphere, asp, etc) files in a folder of your choice to let be crawl by Google Desktop. It work amazingly good. Even that page, made by Websphere, who will never appear in the Google index due to is uncrawlable url and construction, now appear in my Google Desktop query!

With that extension Google Desktop can now see ALL the pages your surfing and you could get back the textual information on a GDS search. Yes!

Here a Google query looking for a few words on the page link before. No URLs pointing to, because Google cannot crawl that page. But with GDS team up with Firefox and the Slogger extension here I can saw it in my "Results Store on your Computer".

Google Desktop : Security Warning

If you install by defaut the Google Desktop and you go check your bank account via the Web, you will have good chances to see all your account informations indexed by the Desktop application.

I you make a search after with your Bank name, you will have a good chance to see in the cached results with all your credit or debit balance at the top your screen. Frightening!!!

You should uncheck this box fast or at least, the Google Desktop should make appear a warning treath each time you enter in a secure zone with https. Specially because because it's not so easy to retreive thoses informations to delete it in your hard disk.

Can anybody tell me why the peoples at Google let this line check in your preferences ?

Update: I know if anyone a bit tech saavy can see all thoses information using my computer, it is just it make it so easy and fast to do it, even without searching specifically for thoses private informations!!!

Google Desktop second impression : Too soon

Here some test done this morning about the files it see on the Web and on your hard drive (hd) :
  • .htm (web yes / hd no)
  • .html?parameter (web no)
  • .html# (web yes)
  • .shtml (web yes/ hd yes)
  • .net files without filename (web no / not test)
  • .php (web no / hd no)
  • .asp (web sometimes* / hd no)

*If your URL is simple like filename.asp or with clear parameters (&src=web or even .asp?cid=11623) but not working if it's like (.asp?uniqueid=21633)

Here some wrong statements I read here and there on forums :

  • It can crawl other than only your C: drive if it's partionned (mine go to G:) and even find other USB drive (not verified by me)
  • It can show you more than 10 results for all your files. Just add &num=100 at the end of URL

Thursday, October 14, 2004

Google Desktop first impression: Frustration

Why ?

Give a bit more transparencies when you invade private life

When a program scan my hard disk I like to know at wich stage he is now, and at least see the name of files he is parsing in a status bar. That should be a minimum of good manners !

Opt-out at the place of Opt-in
Help us improve Google Desktop Search by sending non-personal usage data and crash reports.

Here again Google remain vague and don't have a minimum of clarification !

Time to crawl

Twice as much as Copernic to crawl less file!

Partial file crawling

Seem to have the 101k limit like in Web. That's make me really mad.

Html crawling

Yes, but only if your file have a .html extension, if it's .htm you can not see it! That really sucks!

IE only for Web browsing history

Even if its take Firefox and Opera as your defaut Web browser it is not taking the major hot thing in this program : The Web history!

Meta Indexing of your HTML files


Date of file indexed

Web History file's date is ok. Every computer files I don't open since the indexing process today have the same date of the 5 august ??? Even when I transfer few months ago 30k files from NT 4 english version with international date setting (dd/mm/yy) to my new computer with a french version of XP with date as (yy/mm/dd) don't make any problem !!!

I think I will stop were, take a cold shower, and try a bit more tomorrow before unistalled it!

Wednesday, October 13, 2004

Election 2004 and PPC campaing

On the third week of august I was wondering why nobody in the political field in US was not using PPC campaign ? I definitly found the answer reading this article today who was concern by the same mather :
Google apparently has a policy that bans ads that include "language that advocates against an individual, group or organization"—so attack ads won't fly.

As I seen and get echo of the campaign in US, now I understand why we not see any ads;-)

Tuesday, October 12, 2004

First impression on Exalead

Seem a smaller database than they talk about, but very good to manage about over optimize and SE spammer sites and very nice results with clean little sites and fair to good results for small-middle (200 - 1k pages) but poorer to big one, and also to governement web sites. The clustering is far from Clusty and the Web site location is just average like all other services (and very hard with the technology to be better for now).

But it is still in beta and really giving very good effective results within the really small delay they got to put that up (and differents from all the other "new" sources) and very good new tools and options to dig deeper with a not so bad user interface (should take a bit more minimalist view here). And with all the good and many advanced search features not seen anywhere else, you should keep a eye on it.

Hope for Google, Exalead will not merge, fusion or make a deal with Vivisimo, because if they can handle a better clustering, fusion databases or crawl them deeper with a little tweak in algo for the biggest sites, the Goog will fall under the hundred soon ;-)

Google Related Searches coming back soon ?

I notice this new line in the Robots.txt of Google a few months ago :
Disallow: /relpage/

My first thought was Google will comeback with related searches or a clustering engine, but I was not sure enough, because Relpage is also used in some database query languages.
But now, with the screen shot taken two days ago and post today on the Marc Duval blog, we got the proof Google will comeback with it soon.

Monday, October 11, 2004

New Exalead in Beta

Tons of options for this new version of the french search engine who now got a billion pages index. Here a list of the functionalities :
  • Related Terms
  • Related Categories
  • Web Site Location
  • Document Type
  • Bookmark Site
  • View Site in a Frame
  • One or Three Columns Thumbnail Views
  • Restrict Query to 11 langages
  • Restrict Query to Country
  • Restrict Query to four Types of Office Document or Rtf and Txt
  • Site: Search
  • Stemming Search
  • Phonetic Search
  • Approximate Spelling Search
  • Sorting by Date or Relevance
  • Prefix search
  • Pattern Search

Exalead supports regular expression patterns. Patterns are introduced by a slash ('/') character. Within a regular expression, '.' is a special character that can represent any character, '*' stands for character repetition, '|' stands for 'or', and parenthesis are used to group characters. '?' is placed the end of a character group to make it optional.

Searches for documents with words that match the pattern S . R EN .. PI . Y -- this can be very useful to finish your crossword puzzles!
Searches for documents containing any of the following: mpg, mpg1, mpg2, or mpg3.

Time and Search Engine: Short term memory

A really good study on Google and Altavista and theirs problem with long term memory due to time stamp of Web documents in their databases.

Friday, October 08, 2004

Google client's parameter giving lot less results

When you use the Firefox search box Google will add the client parameter to the URL and sometimes another one name rls. when the &client= parameter is there, you will have a lot less results. Here a query for the word "the" (2,5 billions results) and without the parameter &client= you get a 5,9 billions results !!!

The beginning of the end of RSS ?

If Yahoo really start this project, I think it will be harmful to them, or to RSS if other SE also go in that direction. Hope they will find other way to make money and not corrupting the first easy way to make the web more semantic!

Thursday, October 07, 2004

Wednesday, October 06, 2004

Launch of Snap

The Snap search engine is launch today by Idealab, the guys behind Overture. Here is my first experience :

  • Transparency for the stats they give you
  • Clustering
  • Refine the results
  • Sort on multiple criteria

  • Frame results page
  • The small database
  • Pertinance of results caused by Web Popularity
  • Speed

For now, it will be for me a good metric tool, but if the database going bigger and if they fix relevancy and framed results, it will be a very good competitor.

Tuesday, October 05, 2004

Finally a reverse phone and address lookup for Canada in french

Really nice tools for us up north and also en français pour la première fois!

Michael Moritz : I'm Feeling Lucky

The little history of the Sequoia Capital venture capitalist Michael Moritz who put $12.5 million in Google in 99.

Moritz is too modest to say how much he has made personally from the sale - he is reported to have received up to $280 million in cash, plus stock worth more than a $1 billion - although he dismisses reports he is now a billionaire as "wickedly overstated".

Google best secret : We're a dog company

Google put up a new investor section on his site. And yes they are a dog company (e.) ;-)
We have nothing against cats, per se, but we're a dog company, so as a general rule we feel cats visiting our campus would be fairly stressed out.

Monday, October 04, 2004

Web Mining, XML and SE integration of thoses ?

Nice interview with Mr. Laurie Lock Lee, principal knowledge management consultant with Computer Sciences Corporation in Australia. The last paragraphs are the better's one:
If you could command sites like Google to make one or two big changes, what might you require?

Clearly textual clustering or summarization, which is what we are seeing from companies like Vivisimo, Autonomy, Semio, and Verity. This technology on top of a Google-type search engine, which reaches even further into the rich data sources available over the internet, will make the web even more useful than it is today.

Are there major trends in WM standardization that you think people should be watching more closely?

XML is already the de-facto standard for improving the ability for machines to interpret textual repositories. I would anticipate that XML will become as pervasive as HTML. We in fact may still see a de facto standard emerge.

Hope so, because since 99, Xml seem to me is only use to produce sites from database, but not really to structure the web for a easier comprehension for machine interpretation of the pages produced.

Sunday, October 03, 2004

Why Google don't encourage the users of RSS ?

Only 7% of the sources crawls have XML feeds. I'd estimate that only a few hundreds of the top 3,000 newspapers we crawl have RSS support.

After reading the post on the Topix Blog, I'm a bit surprise that no engine, except Yahoo, clearly encourage or stimulate the use of RSS in our sites. RSS is not only useful to News site, but in a lot of sections of the site. Every of thoses pages like the; Press Release, Portfolio, Calendar, Career, New products, Investor and a lot more sections, should have a RSS feed.

You save bandwith, and your clients save a lot of time by not visiting a page not changed since his last visit or by time save to manage the spam he receive in his email box because of the subscribtion of your press release.

Why not give bonus points on ranking, or a special category, or a logo on SERPs (like Yahoo), to thoses who take time to give to search engine a more structured feed. If the crawler of Google see that feed, we don't have to run complex algo to calculate the "weight" of words in the page and save computing time ! Why not encourage thoses strutured feed ?