The Google Way of Science - The Growing Importance of Data

sah · on June 29, 2008

This whole idea strikes me as extremely confused. Data mining is just observation. Any extrapolation from data is a theory. Anderson and this author seem to be surprised that you can get so far with an incomplete theory that doesn't explain everything. That seems obvious to me -- remember classical mechanics?

Here's a particularly bad example:

"When you misspell a word when googling, Google suggests the proper spelling. How does it know this? How does it predict the correctly spelled word? It is not because it has a theory of good spelling, or has mastered spelling rules. In fact Google knows nothing about spelling rules at all."

"Spelling rules" are a heuristic extrapolation just like the one Google is making, but are probably less accurate! Why is a set of rules designed to be memorable and useful to a human more of a theory than rules that you have to write down, and need a computer to use?

ntoshev · on June 29, 2008

The author defines a theory as something simple enough to be understood by humans.

If you drop this requirement, machine learned theories are a big deal nonetheless.

sah · on June 29, 2008

Sure, they are. But I think it's confusing to characterize machine learning as an alternative to the scientific method; it's an example of the scientific method.

lutorm · on June 29, 2008

I also doubt that there will have to be no theory. While you can analyze huge datasets and look for correlations, you probably need a theory to give you an idea of what data you should collect. Imagine if you tried to build the Large Hadron Collider without any particle physics theory to guide you in the design? How do you decide how to build it and what you should look for? If you build it on a hunch, you might stumble onto some really interesting things, but in most cases it will probably be an expensive collection of nothing special...

tyn · on June 29, 2008

"There was no theory of Chinese, no understanding. Just data. (If anyone ever wanted a disproof of Searle's riddle of the Chinese Room, here it is.)

Shouldn't this be a 'proof' instead of 'disproof'?

hugh · on June 29, 2008

Well,

(a) You can't prove or disprove a riddle

(b) The Chinese Room isn't a riddle, it's a thought experiment

(c) You can't prove or disprove a thought experiment

(d) You might be able to prove or disprove the point that Searle was trying to make by using the Chinese Room, except his point was about consciousness (or even Consciousness), not about machine translation, so ultimately the parenthetical statement seems to be pretty meaningless no matter whether you make it "proof" or "disproof"

tyn · on June 30, 2008

Searle expands on consciousness in his chinese room paper but the main point of the thought experment is that you can have a conversation that sounds intelligent without having any understanding at all of what you say and what you are being told (in other words: passing the Turing test does not imply intelligence).

hugh · on June 30, 2008

You're probably right, it's probably more about intelligence than consciousness (I haven't read it for a long time).

Anyway, it's certainly not about machine translation.

schtog · on June 29, 2008

Yes i noticed the same thing. I doubt the author made a mistake though, didnt seem to understand better.

pixcavator · on June 29, 2008

The author should have applied the "Google's spell checker" to his own article: "this emerging method... will compliment established theory-driven science".