Sunday, July 22, 2012

Google's Hybrid Research + Development Model

Last Updated on July 23rd 2012

Here is a very interesting recent paper from ACM flagship, CACM, July 2012 on "Google’s Hybrid Approach to Research". The page also has an embedded video of around 5 minutes where the authors give their views - worth watching, IMHO.

Some important points of the paper from my perspective:
  1. "Research results come not only from universities, but also from companies, both large and small. The way research results are disseminated is also evolving and the peer-reviewed paper is under threat as the dominant dissemination method. Open source releases, standards specifications, data releases, and novel commercial systems that set new standards upon which others then build are increasingly important."

  2. Google does research + development together with R&D (or R&E) teams usually writing production or near-production code from day one! [Ravi: That's awesome!]

  3. "Typically, a single team iteratively explores fundamental research ideas, develops and maintains the software, and helps operate the resulting Google services— all driven by real-world experience and concrete data."

  4. Google's CS Research follows "Hybrid Research Model" where research teams are encouraged to have the right balance between research and engineering activities. The right balance can vary greatly. [Ravi: That's quite fuzzy. But the message that they give importance to engineering/software development as a vital part of its research model comes through clearly.]

  5. The paper has some information about Google's research efforts e.g. Google Translate, Google File System.

  6. Google publishes research work in academic publications "at increasing rates (from 13 papers published in 2003, to 130 in 2006, to 279 in 2011)."

  7. Google feels that academic publications are "by no means the only mechanism for knowledge dissemination: Googlers have led the creation of over 1,000 open source projects, contributed to various standards (for example, as editor of HTML5), and produced hundreds of public APIs for accessing our services."

  8. Google has "chosen to organize computer science research differently at Google by maximally connecting research and development. This yields not only innovative research results and new technologies, but also valuable new capabilities for the company."
--- end Google Hybrid Research + Development Model Paper - my perspective points ---

Some additional points regarding Google from a friend

Please note that some of the points mentioned below may have been covered in the above paper itself.
  1. Google seems to care about applied research, not pure — a Googler needs to be able to articulate why his/her research will substantially benefit millions of users.
  2. Google research is short-to-medium term: a few years at most.
  3. Google tries to break research down into a number of intermediate deliverables that each have commercial value.
  4. A research project may impact users, or it may advance theoretical knowledge, or ideally both.
  5. They don't build elaborate research prototypes. Focus is on real systems with real data and production-quality code. So research is often a component of a production-oriented larger project rather than being a separate research project in itself.
Necessary components of the Google model:
  1. Smart engineers
  2. Ability for individuals, or entire teams, to transition to or from the research organization
  3. Distributed computing infrastructure that lets a small team use tens of thousands of servers, which enables large-scale experiments
  4. A billion users
  5. Lots of money
The friend reiterated that publishing (academic) papers is only one way to distribute knowledge.

Indian CS Academic Research vs. Google Hybrid CS Research

I (Ravi) find Google's hybrid approach to be very interesting as it is in very great contrast to what I have seen in Indian academic CS research. Very often, the craze is to produce a 'paper' and the 'research' stops there - I have not come across many instances of research efforts from Indian CS or IT academia which went beyond 'paper' to get translated into semi-real-life stuff which can then be handed over to interested software companies for real-life implementation. Maybe I am not that well informed. If Indian CS academia does have a hybrid research + development model then perhaps such models should be given publicity. Anyway I got put off Indian academic CS research due to this 'paper' publication limited goal mind-set - I took a decision to steer away from such 'paper' production oriented research.

The big problem with this kind of 'paper' production research is that, most times, it is out of touch with real-world-software. So many academic conferences are around that getting such an out-of-touch-with-real-world-software paper published is no big deal. Now I am not saying that such  papers have false information in it - No, not at all. They are certainly valid within a very-small-prototype world. The question is whether the approach used in the very-small-prototype world makes sense to be considered in real-world-software. Most readers of such papers would have such questions and may just note the approach used in the paper. Instead, if academic CS research is able to combine research with development in some small way then academic CS papers would have a lot more value. Just imagine such academic papers having a reference to its open source software code + data download link. A reader who is interested in the approach can just download the software + data, and if he likes it, can even consider to do further research on top of this software + data.

I think Indian CS academia should carefully study Google's Hybrid Research + Engineering model and see if it can pick up certain practices of Google and adapt it for use in Indian CS academia.

No comments:

Post a Comment