28th November 2007, 6 min read

Is PageRank just good marketing?

7 thoughts on “Is PageRank just good marketing?”

SÃ©rgio Nunes says:

November 28, 2007 at 12:11 pm

Hi again,

Sorry for lack of details about me. My name is Sérgio Nunes and I’m a PhD student in the field of WebIR.

Also sorry for the lack of a proper reference on my statement. This is a recent experimental work by Marc Najork that delves into this issue:

“HITS on the Web: How does it Compare?”
http://research.microsoft.com/research/pubs/view.aspx?0rc=p&type=Publication&id=1734
Fernando Diaz says:

November 28, 2007 at 12:31 pm

IR folks long-suspected PageRank to be a red herring but was not confirmed until the last few years. The reference I like to use comes from MSR and was published at WWW06,

M. Richardson, A. Prakash, and E. Brill, “Beyond pagerank: machine learning for static ranking,â€ in WWW ’06: Proceedings of the 15th international conference on World Wide Web, (New York, NY, USA), pp. 707â€“715, ACM Press, 2006.

The authors demonstrate that structure-independent features, combined with page’s popularity significantly outperformed PageRank. Informal conversations with engine architects and SEO folks confirms this.

It’s helpful to interpret these results in the context of a random walk on the web graph. PageRank is the stationary distribution of a random walker on the web graph. In situations where you have no knowledge about page visitation , this is a reasonable surrogate. However, in the presence of real user data (gathered through a toolbar or OS), the random walk model seems less attractive than models which incorporate visitation data.

That said, it also seems likely that actual effectiveness of search engines has more to do with using massive amounts of click data to train classic IR features and query triage schemes.
Peter Turney says:

November 28, 2007 at 4:27 pm

Interesting post. I used Google Scholar to find all citations of “Predicting fame and fortune: Pagerank or indegree”. Google found 16 citations:

http://scholar.google.com/scholar?hl=en&lr=&cites=5736996577557537352

I skimmed some of the citations, and two seemed particularly relevant: (1) Hits on the web: how does it compare? (2) Beyond PageRank: Machine Learning for Static Ranking. I was about to post this comment, when I saw that two previous comments gave exactly the same two references. Now I’m posting this comment anyway, to say that Google PageRank may be bogus, but Google Scholar seems to work just fine. 🙂
Panos Ipeirotis says:

December 1, 2007 at 10:44 pm

Just to offer some anecdotal (and unconfirmed) piece of information: it is claimed that the original Pagerank was not exactly the one described in the WWW97 paper.

In the plain vanilla implementation, the underlying model of Pagerank corresponds to a “random surfer” that follows hyperlinks and with probability 0.85 gets bored and jumps to a random page. I have heard that in the actual implementation, the random surfer jumps only to pages in the “edu” domain. (This idea is similar to the TrustRank algorithm.)

Of course, since 1996 many things have changed and today there are so many other factors that are taken into consideration during ranking that it is almost certain that PageRank is mainly a marketing tool.
Jean VÃ©ronis says:

December 2, 2007 at 2:49 pm

I agree that PageRank has become mainly a marketing tool. However, there is a flaw in Upstill’s work. He doesn’t compare in-degree with PageRank but with the score given in Google’s Toolbar, called “PageRank”. Nobody knows what this score is exactly. In particular, nothing proves that it is the real “pure” PageRank as described in the original PageRank paper. I suspect that it is (a downgraded version of) the score that Google uses for ranking, which is a mixture of many factors, in which PageRank plays some (unknown) role.
Daniel Lemire says:

December 2, 2007 at 7:02 pm

Interesting observation, Jean, but the paper by Najork et al. (HITS on the Web: How does it Compare?) support the claim that PageRank is not even as accurate as in-degree.
Jean VÃ©ronis says:

December 3, 2007 at 3:35 am

True. My comment was not in defence of PageRank. The simple fact that Google need to supplement it with several dozens of other criteria shows that it is not ideal 😉 In a way, Upstill said something right with a disputable methodology.