Assorted links – Data Science with R

last updated: 2015-08-29

References & Most helpful commands

Short R Reference Card (PDF) R commands
Knitr Reference Card
Advanced R (wiki), Hadley Wickham Programming in R, UC Riverside
Introduction to R, A First Course in R (PDF), University of Notre Dame
MATLAB commands in numerical Pythom (NumPY): as well as Octave & R (PDF)
Numerical Analysis & Statistics: MATLAB, R, NumPy – a side-by-side reference sheet
data.table:
data.table intro
data.table faq
Exploratory Data Analysis with data.table (videos)
data.table cheat sheet
Cheatsheets (Data Wrangling with dplyr & tidyr, R Markdown, Shiny)
R Documentation
Resources to help you learn and use R, Institute of Digital Research and Education (idre), UCLA

Tutorials & Handy packages

Hands-on dplyr tutorial for faster data manipulation in R Interactive Visualizations From R Using Rcharts rMaps – Interactive Maps from R (github repo) (requires “devtools” from cran)
Using R for Psychological Research – Personality Project, William Revelle
DataCamp courses
Try R by Code School (on codeschool)
Introduction to R, Leada

Visualization Packages

see Assorted links – Data Visualization (to be published later)

Papers

Tidy Data, Hadley Wickham [PDF]

Journals

Big Data & Society – Open-access journal

Hacks for better productivity

Sublime and R

Using Sublime Text 2 for R Using R in Sublime Text 3

Books

Cookbook for R (formerly R Cookbook) {website} {preview} (Amazon)
The Art of R Programming (Amazon)
R in Action (Amazon; 2nd edition)
Advanced R (Amazon)
Practical Data Science with R (Amazon; Manning)
An Introduction to Statistical Learning with Applications in R (ISL) {free copy) (Amazon) (for a broad audience incld non-mathematically trained)
The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (ESL) {free copy} (Amazon) (for the mathematically trained)
Open Intro: Open Intro Statistics, Intro Stat with Randomization and Simulation StatlectThe Digital Textbook – The digital textbook on probability and statistics

Video (training) courses

Introduction to Data Science with R, Garrett Grolemund, O’Reilly Media

Lists of Resources by others

Data Mining

Scraping Twitter and Web Data Using R – Pablo Barbera

Numerical Analysis

Matrices and matrix computations in R, idre, UCLA
Numerical & Statistical Analysis, Using R, Alastair Sanderson
[book] Using R for Numerical Analysis in Science and Engineering, Victor A. Bloomfield, CRC Press
[book] Introduction to Scientific Programming and Simulation using R, Owen Jones et al., CRC Press
[package] Numerical Mathematics

Interoperability

Fortran
Python: RPy, RSPython (Win deficiency)
Mathematica: Built-in Integration with R

Data Sources

see Assorted links – Data sources (To be published later)

If you’d like to contribute to this list, please leave them in the comments below.

Rscript to customize the R environment

A while ago I published a post on how to install some basic packages in R. This post goes further by sharing with you an Rscript (as part of another Ubuntu customization script) to install many popular R packages.

I’ve written the Rscript to be run after a fresh installation of Ubuntu. The Rscript is called by the Ubuntu customization script (yet to be published) and should install some basic and popular R packages.

Below is a Gist. For the repo click here.

	####################################
	## R environment customization script
	# to automate package installation
	# repo is maintained at http://bit.ly/r-customize-script
	####################################
	## To run execute in a terminal:
	# Rscript r-customize.R # depends on R being installed
	####################################
	## Some relevant links
	# Rstudio's Quick list of useful R packages: http://bit.ly/useful_R_packages
	####################################

	## Basic packages
	#################
	install.packages("devtools")
	library(devtools) # to install from source (eg; github)
	install.packages("downloader")
	install.packages("checkpoint")
	install.packages("rJava")
	install.packages("xlsxjars")
	install.packages("xlsx")
	install.packages("data.table")
	install.packages(c("Hmisc", "jpeg"))
	install.packages("RJSONIO") # also required for "WDI"

	# DataTables see: https://rstudio.github.io/DT/
	# installs an R interface to the Js DataTables
	# will ask to select a server
	if (!requireNamespace('htmlwidgets') \|\| packageVersion('htmlwidgets') <= '0.3.2')
	install_github('ramnathv/htmlwidgets')
	install_github('rstudio/DT')


	install.packages("xtable")

	# Web scraping
	##############
	install.packages("XML") # read & create XML docs
	install.packages("rvest") # XML & httr wrappers to make it easy to download & manipulate html & xml.
	install.packages(c("httr", "rjson")) # required for "Rfacebook"
	install.packages("jsonlite")
	install.packages("RCurl")

	## Data Wrangling
	#################
	install.packages(c("dplyr", "reshape2"))
	install.packages("tidyr")
	install.packages("sqldf") # Manipulate R data frames using SQ

	## Swirl
	# Learn R, in R. http://swirlstats.com
	install.packages("swirl")
	# install_github("swirldev/swirl") # latest development version

	# Visualization
	###############
	install.packages("ggplot2")
	install.packages("ggvis")
	install.packages("gridExtra")
	# R interface to dygraphs
	if (!requireNamespace('htmlwidgets'))
	install_github('ramnathv/htmlwidgets')
	install_github('rstudio/dygraphs')


	# Shiny Apps
	install_github('rstudio/shinyapps')
	# download("https://github.com/rstudio/shinyapps/archive/master.tar.gz", "shinyapps.tar.gz")
	# install.packages("shinyapps.tar.gz", repos = NULL, type = "source")

	# Plotly
	install_github("ropensci/plotly")
	# download("https://github.com/ropensci/plotly/archive/master.tar.gz", "plotly.tar.gz")
	# install.packages("plotly.tar.gz", repos = NULL, type = "source")

	install.packages("maptools") # for shapefiles
	# install.packages("rgeos") # required by maptools

	# rMaps (still under development) # https://rmaps.github.io/
	# rCharts required for some (experimental) features
	if (!requireNamespace('rCharts'))
	install_github('ramnathv/rCharts@dev')
	install_github('ramnathv/rMaps')

	# Google Vis
	install.packages("googleVis")

	## Leaftlet
	# R package to create interactive web-maps based on the Leaflet JavaScript library
	install.packages("leafletR")
	# install_github("chgrl/leafletR")
	install_github("rstudio/leaflet") # by Rstudio


	# Documents
	###########
	# for Knitr
	install.packages("yaml"); install.packages("htmltools"); install.packages("rmarkdown")

	# Slidify & Libraries
	install_github('ramnathv/slidify')
	install_github('ramnathv/slidifyLibraries')

	# Spatial & GIS
	###############
	install.packages("sp") # classes and methods for spatial data
	install.packages("maptools") # Tools for Reading and Handling Spatial Objects
	install.packages("maps") # Draw Geographical Maps
	install.packages("ggmap") # Spatial Visualization with Google Maps and OpenStreetMap
	install.packages("raster")
	install.packages("mapdata")
	install.packages("mapproj")
	install.packages("gpclib")
	install.packages("rdgal")
	install.packages("Rgooglemaps")
	install.packages("rgeos")
	install.packages("rasterVis")

	# Connections
	#############

	# API's
	install.packages("streamR") # Access to Twitter Streaming API via R # github: https://github.com/pablobarbera/streamR
	install.packages("Rfacebook") # provides an interface to the Facebook API

	# Connect to Databases
	install.packages("DBI") # database interface (DBI) definition for communication between R and relational database management systems
	install.packages("RMySQL") # DBI-compliant Interface to MySQL and MariaDB Databases
	install.packages("dbConnect") # Provides a graphical user interface to connect with databases that use MySQL

	# Data scources
	install.packages("Quandl")
	install.packages("WDI") #github.com/vincentarelbundock/WDI

	# Big Data
	##########
	# Packages to deal with datasets larger than RAM
	install.packages("bigmemory") # Manage massive matrices with shared memory and memory-mapped files

	# Medical packages
	install.packages("oro.dicom")

	# Machine Learning & Predictive Modeling
	########################################
	## caret – Classification And REgression Training
	install.packages("caret")
	install.packages("e1071") # needed when fitting a model in caret

view raw

r-customize.R

hosted with ❤ by GitHub

R – Labels inside ggplots using directlabels

The other day I generated the following figure with ggplot2

using the following code:

ggplot(dat, aes(x = Year, y = log10(sum), group = id, colour = id)) +
 geom_point() +
 geom_smooth() +
 labs(x = "", y = "")

Note that I used the “group” argument to plot both curves on the same figure. Similarly I used the “colour” argument to colorize each curve differently.

But instead of a legend I wanted to have labels on or near the curves. To do that I resorted to the “directlabels” package.

First I needed to install it after installing its dependency package “quadprog” and load ggplot2 & directlabels:

install.packages("quadprog") # dependency for directlabels
install.packages("directlabels", repo="http://r-forge.r-project.org")

library(ggplot2)
library(directlabels) # load "directlabels"

To plot the figure, I went on as before but instead I assigned the plot command to “p” which I then passed on to direct.label().

p <- ggplot(dat, aes(x = Year, y = log10(sum), group = id, colour = id)) +
 geom_point() +
 geom_smooth() +
 labs(x = "", y = "")
direct.label(p)

As you can see the direct.label() function took care of the legend and replaced it with labels on the curves:

This is a really useful package.

If you found this post helpful please give it a like or share it somewhere in the digital universe.

Installing Some Basic R Packages in Ubuntu

The following is how I configured my R workspace (and Rstudio) and this was first shared on a Coursera’s “Getting and Cleaning Data” course forums.

First make sure that R is version 3+. If not update it according to this stackoverflow question.

Java for rJava

Install Java (needed for rJava) first from a terminal:

sudo apt-get install openjdk-6-jre

which will install openjdk-6-jdk.
If this doesn’t work install all its packages:

sudo apt-get install openjdk-6-*

OR you might prefer openjdk-7-jdk

sudo apt-get install openjdk-7-*

You should find that it is installed using this command:

Ibrahim El Merehbi

R language

Assorted links – Data Science with R

References & Most helpful commands

Tutorials & Handy packages

Visualization Packages

Papers

Journals

Hacks for better productivity

Sublime and R

Books

Video (training) courses

Lists of Resources by others

Data Mining

Numerical Analysis

Interoperability

Data Sources

Rscript to customize the R environment

R – Labels inside ggplots using directlabels

Installing Some Basic R Packages in Ubuntu

Java for rJava

References & Most helpful commands

Tutorials & Handy packages

Visualization Packages

Papers

Journals

Hacks for better productivity

Sublime and R

Books

Video (training) courses

Lists of Resources by others

Data Mining

Numerical Analysis

Interoperability

Data Sources

Share this:

Share this:

Share this:

Java for rJava

Share this: