Anaconda Python needs cleaning every once in a while


As a user of Anaconda python I have been receiving (Ubuntu) system warnings of low free space in my home directory. Investigating what was causing this I found out that Anaconda python had several versions of each package. The overall size of the pkgs directory was 14+ GB. After cleaning it is about 3GB.

The second largest directory was my mail in Thunderbird.

So it would be wise, especially if you are limited in disk space, to clean Anaconda. The commands I used are as follows:

conda clean --all
conda update conda
# just to make sure nothing is broken and
# your environment is updated
source activate <your-environment>
conda update --all
conda clean --all

Amazing NASA simulation of Solar Wind Striping the Martian Atmosphere


The following video by NASA’s Scientific Visualization Center simulates the Martian atmosphere being striped by incoming solar wind.

More videos and images can be found here.

Mars is a cold and barren desert today, but scientists think that in the ancient past it was warm and wet. The loss of the early Martian atmosphere may have led to this dramatic change, and one of the prime suspects is the solar wind. Unlike Earth, Mars lacks a global magnetic field to deflect the stream of charged particles continuously blowing off the Sun. Instead, the solar wind crashes into the Mars upper atmosphere and can accelerate ions into space. Now, for the first time, NASA’s MAVEN spacecraft has observed this process in action – by measuring the speed and direction of ions escaping from Mars. This data visualization compares simulations of the solar wind and Mars atmospheric escape with new measurements taken by MAVEN.

Matlab – Symbolic & Function Handles


Consider you want to define a function in Matlab, plot it, and differentiate it. This can be done in two ways. Let’s demonstrate the two methods on the function

f(x) = x - 3 * log(x)

whose derivative is

f'(x) = 1 - \frac{3}{x}

The first is using function handles (ie; numerically) that take & return values as input & output. Function handles require the inputs to be initialized. Here’s an example:

x = linspace(0.5, 5.0)'; % range of x as column vector
f = x - 3 * log(x); % returns numerical values
df = diff(f) ./ diff(x); % also numerical values of the derivative

The other is symbolically (ie; like you do in your math class) as such

syms x % define symbolic variables
f = x - 3 * log(x); % symbolic function
df = diff(f, x); % gives the symbolic derivative

which returns

f = x - 3*log(x)
df = 1 - 3/x

But this way you cannot give the funtion numerical inputs and hence can’t plot it. To do so you’ll have to convert the function to a function handle which is easy using matlabFunction():

f_handle = matlabFunction(f); % convert symbolic fn to a handle
f_handle(2); % value of f @ x = 2
x = linspace(0.5, 5.0)'; % define range for x as column vector
plot(x, f_handle(x) ) % plot f on the range x

which return

f_handle = @(x)x-log(x).*3.0
df_handle = @(x)-3.0./x+1.0

Best Practices for Scientific Computing


A summary of a very interesting paper on “Best Practices for Scientific Computing” I read a year ago.

andrea cirillo's blog

I reproduce here below principles from the amazing paper Best Practices for Scientific Computing, published on 2012 by a group of US and UK professors. The main purpose of the paper is to “teach”  good programming habits shared from professional developers to people  that weren’t born developer, and became developers just for professional purposes.

Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently

Best Practices for Scientific Computing

  1. Write programs for people, not computers.

    1. a program should not require its readers to hold more than a handful of facts in memory at once
    2. names should be consistent, distinctive and meaningful
    3. code style and formatting should be consistent
    4. all aspects of software development should be broken down into tasks roughly an hour long
  2. Automate repetitive tasks.

    1. rely on the computer to repeat tasks
    2. save recent commands in…

View original post 226 more words

Assorted links – Data Science with R


last updated: 2015-08-29

References & Most helpful commands

Tutorials & Handy packages

Hands-on dplyr tutorial for faster data manipulation in R Interactive Visualizations From R Using Rcharts rMaps – Interactive Maps from R (github repo) (requires “devtools” from cran)
Using R for Psychological Research – Personality Project, William Revelle
DataCamp courses
Try R by Code School (on codeschool)
Introduction to R, Leada

Visualization Packages

see Assorted links – Data Visualization (to be published later)


Tidy Data, Hadley Wickham [PDF]


Big Data & Society – Open-access journal

Hacks for better productivity

Sublime and R

Using Sublime Text 2 for R Using R in Sublime Text 3


Video (training) courses

Introduction to Data Science with R, Garrett Grolemund, O’Reilly Media

Lists of Resources by others

Data Mining

Scraping Twitter and Web Data Using R – Pablo Barbera

Numerical Analysis
Data Sources

see Assorted links – Data sources (To be published later)

If you’d like to contribute to this list, please leave them in the comments below.