The day after the famous interview was aired, the complete transcript of the interview was available online to read through. On facebook, I saw one of my college juniors had made a simple shell script to count the number occurrences of certain words. This by itself was quite insightful and I knew I could take it to the next level without much effort by using R and few of its packages.

I’ve uploaded the R script on github. Following is the basic flow of the script

  1. Separate the Rahul and Arnabs conversation into 2 buckets
  2. Remove extra spaces
  3. Remove punctuation
  4. Convert the text to lower case
  5. Remove the stop words
  6. Convert the text to a term document matrix
  7. Rank the words based on their occurrences
  8. Generate the word cloud and also their top 5 words

Here is the word cloud.