We’re building a dystopia just to make people click on ads, Zeynep Tufekci [TED talk]


Experiments show that what the algorithm picks to show you can affect your emotions. But that’s not all. It also affects political behavior.


Assorted links – Data Science with R


last updated: 2015-08-29

References & Most helpful commands

Tutorials & Handy packages

Hands-on dplyr tutorial for faster data manipulation in R Interactive Visualizations From R Using Rcharts rMaps – Interactive Maps from R (github repo) (requires “devtools” from cran)
Using R for Psychological Research – Personality Project, William Revelle
DataCamp courses
Try R by Code School (on codeschool)
Introduction to R, Leada

Visualization Packages

see Assorted links – Data Visualization (to be published later)


Tidy Data, Hadley Wickham [PDF]


Big Data & Society – Open-access journal

Hacks for better productivity

Sublime and R

Using Sublime Text 2 for R Using R in Sublime Text 3


Video (training) courses

Introduction to Data Science with R, Garrett Grolemund, O’Reilly Media

Lists of Resources by others

Data Mining

Scraping Twitter and Web Data Using R – Pablo Barbera

Numerical Analysis
Data Sources

see Assorted links – Data sources (To be published later)

If you’d like to contribute to this list, please leave them in the comments below.

R language Development 1997-2015


The open-source world keeps surprising me. It is really amazing how internationally distributed individuals meet and collaborate on open-source projects and develop amazing products that exceed commercially available products. One such example is the development of the R statistical programming language.

Watch the video below and observe how its development since 1997 is similar to the work of ants and bees constructing colonies and hives.

Map of Universities offering Data Science degrees


Below is a nice map created by Ali Rebaie of universities offering degrees in Data Science based on data from this github repo. Contribute using this Google spreadsheet.

[via Ali Rebaie]

Rscript to customize the R environment


A while ago I published a post on how to install some basic packages in R. This post goes further by sharing with you an Rscript (as part of another Ubuntu customization script) to install many popular R packages.

I’ve written the Rscript to be run after a fresh installation of Ubuntu. The Rscript is called by the Ubuntu customization script (yet to be published) and should install some basic and popular R packages.

Below is a Gist. For the repo click here.

Big data: predicting future city solutions, Rand Hindi, CEO Snips


R – Labels inside ggplots using directlabels


The other day I generated the following figure with ggplot2


using the following code:

ggplot(dat, aes(x = Year, y = log10(sum), group = id, colour = id)) +
 geom_point() +
 geom_smooth() +
 labs(x = "", y = "")

Note that I used the “group” argument to plot both curves on the same figure. Similarly I used the “colour” argument to colorize each curve differently.

But instead of a legend I wanted to have labels on or near the curves. To do that I resorted to the “directlabels” package.

First I needed to install it after installing its dependency package “quadprog” and load ggplot2 & directlabels:

install.packages("quadprog") # dependency for directlabels
install.packages("directlabels", repo="http://r-forge.r-project.org")

library(directlabels) # load "directlabels"

To plot the figure, I went on as before but instead I assigned the plot command to “p” which I then passed on to direct.label().

p <- ggplot(dat, aes(x = Year, y = log10(sum), group = id, colour = id)) +
 geom_point() +
 geom_smooth() +
 labs(x = "", y = "")

As you can see the direct.label() function took care of the legend and replaced it with labels on the curves:


This is a really useful package.

If you found this post helpful please give it a like or share it somewhere in the digital universe.