Data science: A comparison of interpreted languages for AI and data science

I have taken several online classes this last year from both edX and Coursera. The classes each used an interpreted programming language for doing exercises and labs associated with the class. The three languages I used were:

  • R – for data analytics and visualization
  • Python (with numpy) – for neural networks
  • Octave – for machine learning

The R language is designed for statistical computing and graphics. Data in tables is its most common data type.

The Python language is a general-purpose programming language that has been extended with the numpy package to allow it to be used for data intensive computing, like machine learning algorithms that rely heavily on matrix computations with large matrices.

Octave is intended for numerical calculations, and has many features that allow the user to easily perform matrix calculations efficiently.

All three languages exist in free open source versions that run on all the popular operating system platforms. I used the MacOS versions of each.

Initially I assumed that since the languages are all interpreted, they would have difficulty handling large numerical problems. However, it turns out that  in each case the language provides a package or built-in capability that performs numerical calculations very efficiently in compiled code called from the interpreted language.

As an example, here is a table showing an experiment I did to determine how long (in seconds) each language takes my MacBook Pro laptop to invert a large square matrix of various sizes.

Matrix size R Python Octave
100X100 .016 .0005 .0033
1000X1000 1.16 .073 .162
2000X2000 8.35 .437 1.06
5000X5000 121.10 5.97 14.6

The largest matrix contains 25,000,000 floating point numbers! To me that is impressive performance from an interpreted language.