*Last updated on August 27, 2020 by Dan Nanni*

Get serious with scientific computing in Linux by learning to use NumPy. NumPy is a Python-based open-source scientific computing package released under the BSD license that serves as a free yet powerful alternative to proprietary packages (such as MATLAB) that come with licensing fees. The numerous built-in data analysis tools, extensive documentation, and detailed examples render NumPy an ideal package for use in intensive scientific computing applications. This tutorial will highlight several features of NumPy to demonstrate its capabilities and ease of use.

NumPy offers a vast array (pun intended!) of features, including (but certainly not limited to) the following:

- Multidimensional array objects
- Conversion from Python lists and tuples to NumPy arrays (and vice versa)
- Importing data from text files
- Math (arithmetic, trigonometry, exponents, logarithms...)
- Random sampling (Normal, uniform, binomial, Poisson distributions...)
- Statistics (mean, standard deviation, histograms...)
- Fourier transforms (discrete, inverse, multidimensional)
- Linear algebra (dot product, eigenvalues, solving systems of linear equations...)
- Matrices (sum, product, transpose...)
- Writing data to text files
- Integration into existing Python workflows and scripts

NumPy offers an advantage over other scientific computing packages with no licensing fees (such as GNU Octave, released under the GNU General Public License) because you can create Python workflows that utilize NumPy AND any other Python packages, giving you a wide variety of tools at your disposal that are all controlled and connected via Python. Additionally, NumPy's syntax is inherently Pythonic, allowing you to break away from MATLAB-like syntax (used in GNU Octave) and apply your Python skills.

To install NumPy on Linux, run the following command:

$ sudo apt-get install python-numpy

$ sudo yum install numpy

You must have Python installed (generally installed by default) in order to use NumPy.

This tutorial will provide several examples that demonstrate how to use NumPy:

- Basic array arithmetic and comparisons
- Importing data from a comma-delimited text file
- Sampling uniformly between two values

In these examples we will use NumPy from the command-line via an interactive Python shell. Begin by starting an interactive Python shell, and then importing the NumPy library via the `import`

command and assigning `np`

as a reference to the `numpy`

library:

$ python

Python 2.7.3 Type "help", "copyright", "credits" or "license" for more information. >>>import numpy as np

Define a NumPy array object named `A`

that has three rows, each of which contains three 32-bit integer values. Print the contents of the array by entering the name of the array object.

>>>A = np.array([[2, 2, 2], [4, 4, 4], [6, 6, 6]], np.int32)>>>Aarray([[2, 2, 2], [4, 4, 4], [6, 6, 6]], dtype=int32)

Define a second NumPy array object named `B`

that has three rows, each of which contains three 32-bit integer values:

>>>B = np.array([[1, 1, 1], [5, 5, 5], [10, 10, 10]], np.int32)>>>Barray([[ 1, 1, 1], [ 5, 5, 5], [10, 10, 10]], dtype=int32)

Define a third array as the sum of the first two arrays:

>>>C = A + B>>>Carray([[ 3, 3, 3], [ 9, 9, 9], [16, 16, 16]], dtype=int32)

Determine which of the values in the third array are greater than 10:

>>>C.__gt__(10)array([[False, False, False], [False, False, False], [ True, True, True]], dtype=bool)

Consider a file called `data.txt`

that contains the following comma-delimited data:

1.0,2.0,3.0,4.0,5.0 600.0,700.0,800.0,900.0,1000.0

You can manually create a NumPy array object that contains these data, but that means you need to type in each value individually. With very large datasets, this can be quite tedious and error-prone. Character-delimited data from text files can easily be imported into NumPy arrays.

Define an array named `D`

that contains the data from the `data.txt`

file, and specify that the data to be imported are 64-bit floating-point numbers separated (delimited) with commas:

>>>D = np.loadtxt('data.txt', dtype=np.float64, delimiter=',')>>>Darray([[ 1., 2., 3., 4., 5.], [ 600., 700., 800., 900., 1000.]])

This feature of NumPy can save a tremendous amount of time that would otherwise be spent manually defining NumPy array objects. If you can format your data of interest into a character-delimited text file then importing these data into a NumPy array is easily accomplished through a single command.

Suppose you want to generate 100 randomly-sampled values between 0.0 and 1.0 using a uniform probability distribution (all values between 0.0 and 1.0 have an equal chance of being selected). This is easily performed as follows, with the 100 samples stored in a NumPy array object called `E`

:

>>>E = np.random.uniform(0.0, 1.0, 100)>>>Earray([ 0.90319756, 0.39696831, 0.87253663, 0.2541832 , 0.09188716, 0.41019978, 0.87418001, 0.13551479, 0.60185788, 0.8717379 , 0.91012149, 0.9781284 , 0.97365995, 0.95618329, 0.25079489, 0.94314188, 0.92708129, 0.64377239, 0.27262929, 0.63310245, 0.7315558 , 0.53799042, 0.04425291, 0.1377755 , 0.69068289, 0.9929916 , 0.56488252, 0.25588388, 0.81735705, 0.98430142, 0.38541288, 0.81925846, 0.23941429, 0.9996938 , 0.49898967, 0.87731326, 0.41729317, 0.08407739, 0.09734557, 0.23217088, 0.29291853, 0.09453821, 0.05676644, 0.97170175, 0.25987992, 0.11203194, 0.68670969, 0.77228168, 0.85391461, 0.96315244, 0.34276206, 0.8918815 , 0.93095419, 0.33098585, 0.71910359, 0.73351498, 0.20238829, 0.75232483, 0.12985561, 0.13185072, 0.99842567, 0.78278125, 0.1550288 , 0.03083502, 0.34190622, 0.1755099 , 0.67803282, 0.31715532, 0.29491133, 0.35878659, 0.46047523, 0.27475024, 0.24985922, 0.5595999 , 0.14831301, 0.20137857, 0.79864609, 0.81361761, 0.22554692, 0.84947817, 0.48316828, 0.8848909 , 0.27639724, 0.02182878, 0.95491984, 0.31427821, 0.6760356 , 0.27305986, 0.73480237, 0.9581474 , 0.5614434 , 0.12382754, 0.42856939, 0.69581633, 0.39598608, 0.86023031, 0.59549305, 0.41717616, 0.70233037, 0.66019342])

We can perform a sanity check for these results using NumPy's `histogram`

tool. For the present example, we expect that approximately 50% of the sampled values will lie between 0.0 and 0.5, and that the remaining 50% will lie between 0.5 and 1.0 (given that we have two bins of equal width defined by lower and upper limits of 0.0 and 1.0, respectively):

>>>np.histogram(E, bins=2, range=(0.0, 1.0))(array([49, 51]), array([ 0. , 0.5, 1. ]))

Our expectations are verified given that the `histogram`

tool indicates that 49 out of the 100 samples (49%) lie in the first bin (0.0 to 0.5) and that 51 out of the 100 samples (51%) lie in the second bin (0.5 to 1.0).

This tutorial provides an overview of the features of the NumPy scientific computing package, and uses several examples to demonstrate how easy it is to learn and use. Documentation and examples for the NumPy package can be found at the official site.

This website is made possible by minimal ads and your gracious donation via PayPal or credit card

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.

Xmodulo © 2021 ‒ About ‒ Write for Us ‒ Feed ‒ Powered by DigitalOcean