Map of Universities offering Data Science degrees

Standard

Below is a nice map created by Ali Rebaie of universities offering degrees in Data Science based on data from this github repo. Contribute using this Google spreadsheet.

[via Ali Rebaie]

Rscript to customize the R environment

Standard

A while ago I published a post on how to install some basic packages in R. This post goes further by sharing with you an Rscript (as part of another Ubuntu customization script) to install many popular R packages.

I’ve written the Rscript to be run after a fresh installation of Ubuntu. The Rscript is called by the Ubuntu customization script (yet to be published) and should install some basic and popular R packages.

Below is a Gist. For the repo click here.


####################################
## R environment customization script
# to automate package installation
# repo is maintained at http://bit.ly/r-customize-script
####################################
## To run execute in a terminal:
# Rscript r-customize.R # depends on R being installed
####################################
## Some relevant links
# Rstudio's Quick list of useful R packages: http://bit.ly/useful_R_packages
####################################
## Basic packages
#################
install.packages("devtools")
library(devtools) # to install from source (eg; github)
install.packages("downloader")
install.packages("checkpoint")
install.packages("rJava")
install.packages("xlsxjars")
install.packages("xlsx")
install.packages("data.table")
install.packages(c("Hmisc", "jpeg"))
install.packages("RJSONIO") # also required for "WDI"
# DataTables see: https://rstudio.github.io/DT/
# installs an R interface to the Js DataTables
# will ask to select a server
if (!requireNamespace('htmlwidgets') || packageVersion('htmlwidgets') <= '0.3.2')
install_github('ramnathv/htmlwidgets')
install_github('rstudio/DT')
install.packages("xtable")
# Web scraping
##############
install.packages("XML") # read & create XML docs
install.packages("rvest") # XML & httr wrappers to make it easy to download & manipulate html & xml.
install.packages(c("httr", "rjson")) # required for "Rfacebook"
install.packages("jsonlite")
install.packages("RCurl")
## Data Wrangling
#################
install.packages(c("dplyr", "reshape2"))
install.packages("tidyr")
install.packages("sqldf") # Manipulate R data frames using SQ
## Swirl
# Learn R, in R. http://swirlstats.com
install.packages("swirl")
# install_github("swirldev/swirl") # latest development version
# Visualization
###############
install.packages("ggplot2")
install.packages("ggvis")
install.packages("gridExtra")
# R interface to dygraphs
if (!requireNamespace('htmlwidgets'))
install_github('ramnathv/htmlwidgets')
install_github('rstudio/dygraphs')
# Shiny Apps
install_github('rstudio/shinyapps')
# download("https://github.com/rstudio/shinyapps/archive/master.tar.gz&quot;, "shinyapps.tar.gz")
# install.packages("shinyapps.tar.gz", repos = NULL, type = "source")
# Plotly
install_github("ropensci/plotly")
# download("https://github.com/ropensci/plotly/archive/master.tar.gz&quot;, "plotly.tar.gz")
# install.packages("plotly.tar.gz", repos = NULL, type = "source")
install.packages("maptools") # for shapefiles
# install.packages("rgeos") # required by maptools
# rMaps (still under development) # https://rmaps.github.io/
# rCharts required for some (experimental) features
if (!requireNamespace('rCharts'))
install_github('ramnathv/rCharts@dev')
install_github('ramnathv/rMaps')
# Google Vis
install.packages("googleVis")
## Leaftlet
# R package to create interactive web-maps based on the Leaflet JavaScript library
install.packages("leafletR")
# install_github("chgrl/leafletR")
install_github("rstudio/leaflet") # by Rstudio
# Documents
###########
# for Knitr
install.packages("yaml"); install.packages("htmltools"); install.packages("rmarkdown")
# Slidify & Libraries
install_github('ramnathv/slidify')
install_github('ramnathv/slidifyLibraries')
# Spatial & GIS
###############
install.packages("sp") # classes and methods for spatial data
install.packages("maptools") # Tools for Reading and Handling Spatial Objects
install.packages("maps") # Draw Geographical Maps
install.packages("ggmap") # Spatial Visualization with Google Maps and OpenStreetMap
install.packages("raster")
install.packages("mapdata")
install.packages("mapproj")
install.packages("gpclib")
install.packages("rdgal")
install.packages("Rgooglemaps")
install.packages("rgeos")
install.packages("rasterVis")
# Connections
#############
# API's
install.packages("streamR") # Access to Twitter Streaming API via R # github: https://github.com/pablobarbera/streamR
install.packages("Rfacebook") # provides an interface to the Facebook API
# Connect to Databases
install.packages("DBI") # database interface (DBI) definition for communication between R and relational database management systems
install.packages("RMySQL") # DBI-compliant Interface to MySQL and MariaDB Databases
install.packages("dbConnect") # Provides a graphical user interface to connect with databases that use MySQL
# Data scources
install.packages("Quandl")
install.packages("WDI") #github.com/vincentarelbundock/WDI
# Big Data
##########
# Packages to deal with datasets larger than RAM
install.packages("bigmemory") # Manage massive matrices with shared memory and memory-mapped files
# Medical packages
install.packages("oro.dicom")
# Machine Learning & Predictive Modeling
########################################
## caret – Classification And REgression Training
install.packages("caret")
install.packages("e1071") # needed when fitting a model in caret

view raw

r-customize.R

hosted with ❤ by GitHub

Big data: predicting future city solutions, Rand Hindi, CEO Snips

Video

http://youtu.be/8VD-lwvWVGY

R – Labels inside ggplots using directlabels

Standard

The other day I generated the following figure with ggplot2

plot

using the following code:

ggplot(dat, aes(x = Year, y = log10(sum), group = id, colour = id)) +
 geom_point() +
 geom_smooth() +
 labs(x = "", y = "")

Note that I used the “group” argument to plot both curves on the same figure. Similarly I used the “colour” argument to colorize each curve differently.

But instead of a legend I wanted to have labels on or near the curves. To do that I resorted to the “directlabels” package.

First I needed to install it after installing its dependency package “quadprog” and load ggplot2 & directlabels:

install.packages("quadprog") # dependency for directlabels
install.packages("directlabels", repo="http://r-forge.r-project.org")

library(ggplot2)
library(directlabels) # load "directlabels"

To plot the figure, I went on as before but instead I assigned the plot command to “p” which I then passed on to direct.label().

p <- ggplot(dat, aes(x = Year, y = log10(sum), group = id, colour = id)) +
 geom_point() +
 geom_smooth() +
 labs(x = "", y = "")
direct.label(p)

As you can see the direct.label() function took care of the legend and replaced it with labels on the curves:

plot

This is a really useful package.

If you found this post helpful please give it a like or share it somewhere in the digital universe.

Installing Some Basic R Packages in Ubuntu

Standard

The following is how I configured my R workspace (and Rstudio) and this was first shared on a Coursera’s “Getting and Cleaning Data” course forums.

First make sure that R is version 3+. If not update it according to this stackoverflow question.

Java for rJava

Install Java (needed for rJava) first from a terminal:

sudo apt-get install openjdk-6-jre

which will install openjdk-6-jdk.
If this doesn’t work install all its packages:

sudo apt-get install openjdk-6-*

OR you might prefer openjdk-7-jdk

sudo apt-get install openjdk-7-*

You should find that it is installed using this command:

Continue reading