Category Archives: technology

Data science: First Harvard edX classes

After doing the self-study described in Data Science: An introduction to neural networks, I looked for an online class to learn about data science in depth. I settled on the Data Science Professional Certificate series sponsored by Harvard University on the edX platform. So far I have completed three of the total of nine classes in the series.

Sample image from my Data Science class
Sample ggplot2 plot that compares income distribution in various parts of the African continent for the years 1970 and 2010

The classes use the R programming language and consist of video lectures, assessments (tests) done via the DataCamp platform, and an online Data Science text. The classes cost $49 each if you take them for credit. So far I am enjoying them a lot and making good progress at learning the subjects.

I work on the classes for a couple of hours a day, several times a week, and I have been taking about a month to complete each class.

The first two classes cover the basics of the R language and the RStudio environment followed by an introduction to the dplyr and ggplot2 packages for R.

The dplyr package adds extensions to the R language that make dealing with the R syntax much less of a burden, and adds pipelines and data transformations as well. The ggplot2 package is a  powerful data visualization  package that lets you plot data in a large number of ways.

The next classes focus on probability,  inference, and modeling.