How to use NumPy for scientific computing in Linux

Last updated on August 27, 2020 by Dan Nanni

Get serious with scientific computing in Linux by learning to use NumPy. NumPy is a Python-based open-source scientific computing package released under the BSD license that serves as a free yet powerful alternative to proprietary packages (such as MATLAB) that come with licensing fees. The numerous built-in data analysis tools, extensive documentation, and detailed examples render NumPy an ideal package for use in intensive scientific computing applications. This tutorial will highlight several features of NumPy to demonstrate its capabilities and ease of use.

Features

NumPy offers a vast array (pun intended!) of features, including (but certainly not limited to) the following:

NumPy offers an advantage over other scientific computing packages with no licensing fees (such as GNU Octave, released under the GNU General Public License) because you can create Python workflows that utilize NumPy AND any other Python packages, giving you a wide variety of tools at your disposal that are all controlled and connected via Python. Additionally, NumPy's syntax is inherently Pythonic, allowing you to break away from MATLAB-like syntax (used in GNU Octave) and apply your Python skills.

Installation of Numpy

To install NumPy on Linux, run the following command:

On Debian or Ubuntu:

$ sudo apt-get install python-numpy

On Fedora or CentOS:

$ sudo yum install numpy

You must have Python installed (generally installed by default) in order to use NumPy.

NumPy Examples

This tutorial will provide several examples that demonstrate how to use NumPy:

In these examples we will use NumPy from the command-line via an interactive Python shell. Begin by starting an interactive Python shell, and then importing the NumPy library via the import command and assigning np as a reference to the numpy library:

$ python
Python 2.7.3
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np

Example 1: Basic Array Arithmetic and Comparison

Define a NumPy array object named A that has three rows, each of which contains three 32-bit integer values. Print the contents of the array by entering the name of the array object.

>>> A = np.array([[2, 2, 2], [4, 4, 4], [6, 6, 6]], np.int32)
>>> A
array([[2, 2, 2],
       [4, 4, 4],
       [6, 6, 6]], dtype=int32)

Define a second NumPy array object named B that has three rows, each of which contains three 32-bit integer values:

>>> B = np.array([[1, 1, 1], [5, 5, 5], [10, 10, 10]], np.int32)
>>> B
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]], dtype=int32)

Define a third array as the sum of the first two arrays:

>>> C = A + B
>>> C
array([[ 3,  3,  3],
       [ 9,  9,  9],
       [16, 16, 16]], dtype=int32)

Determine which of the values in the third array are greater than 10:

>>> C.__gt__(10)
array([[False, False, False],
       [False, False, False],
       [ True,  True,  True]], dtype=bool)

Example 2: Importing Data from a Comma-Delimited Text File

Consider a file called data.txt that contains the following comma-delimited data:

1.0,2.0,3.0,4.0,5.0
600.0,700.0,800.0,900.0,1000.0

You can manually create a NumPy array object that contains these data, but that means you need to type in each value individually. With very large datasets, this can be quite tedious and error-prone. Character-delimited data from text files can easily be imported into NumPy arrays.

Define an array named D that contains the data from the data.txt file, and specify that the data to be imported are 64-bit floating-point numbers separated (delimited) with commas:

>>> D = np.loadtxt('data.txt', dtype=np.float64, delimiter=',')
>>> D
array([[    1.,     2.,     3.,     4.,     5.],
       [  600.,   700.,   800.,   900.,  1000.]])

This feature of NumPy can save a tremendous amount of time that would otherwise be spent manually defining NumPy array objects. If you can format your data of interest into a character-delimited text file then importing these data into a NumPy array is easily accomplished through a single command.

Example 3: Sampling Uniformly Between Two Values

Suppose you want to generate 100 randomly-sampled values between 0.0 and 1.0 using a uniform probability distribution (all values between 0.0 and 1.0 have an equal chance of being selected). This is easily performed as follows, with the 100 samples stored in a NumPy array object called E:

>>> E = np.random.uniform(0.0, 1.0, 100)
>>> E
array([ 0.90319756,  0.39696831,  0.87253663,  0.2541832 ,  0.09188716,
        0.41019978,  0.87418001,  0.13551479,  0.60185788,  0.8717379 ,
        0.91012149,  0.9781284 ,  0.97365995,  0.95618329,  0.25079489,
        0.94314188,  0.92708129,  0.64377239,  0.27262929,  0.63310245,
        0.7315558 ,  0.53799042,  0.04425291,  0.1377755 ,  0.69068289,
        0.9929916 ,  0.56488252,  0.25588388,  0.81735705,  0.98430142,
        0.38541288,  0.81925846,  0.23941429,  0.9996938 ,  0.49898967,
        0.87731326,  0.41729317,  0.08407739,  0.09734557,  0.23217088,
        0.29291853,  0.09453821,  0.05676644,  0.97170175,  0.25987992,
        0.11203194,  0.68670969,  0.77228168,  0.85391461,  0.96315244,
        0.34276206,  0.8918815 ,  0.93095419,  0.33098585,  0.71910359,
        0.73351498,  0.20238829,  0.75232483,  0.12985561,  0.13185072,
        0.99842567,  0.78278125,  0.1550288 ,  0.03083502,  0.34190622,
        0.1755099 ,  0.67803282,  0.31715532,  0.29491133,  0.35878659,
        0.46047523,  0.27475024,  0.24985922,  0.5595999 ,  0.14831301,
        0.20137857,  0.79864609,  0.81361761,  0.22554692,  0.84947817,
        0.48316828,  0.8848909 ,  0.27639724,  0.02182878,  0.95491984,
        0.31427821,  0.6760356 ,  0.27305986,  0.73480237,  0.9581474 ,
        0.5614434 ,  0.12382754,  0.42856939,  0.69581633,  0.39598608,
        0.86023031,  0.59549305,  0.41717616,  0.70233037,  0.66019342])

We can perform a sanity check for these results using NumPy's histogram tool. For the present example, we expect that approximately 50% of the sampled values will lie between 0.0 and 0.5, and that the remaining 50% will lie between 0.5 and 1.0 (given that we have two bins of equal width defined by lower and upper limits of 0.0 and 1.0, respectively):

>>> np.histogram(E, bins=2, range=(0.0, 1.0))
(array([49, 51]), array([ 0. ,  0.5,  1. ]))

Our expectations are verified given that the histogram tool indicates that 49 out of the 100 samples (49%) lie in the first bin (0.0 to 0.5) and that 51 out of the 100 samples (51%) lie in the second bin (0.5 to 1.0).

Summary

This tutorial provides an overview of the features of the NumPy scientific computing package, and uses several examples to demonstrate how easy it is to learn and use. Documentation and examples for the NumPy package can be found at the official site.

Support Xmodulo

This website is made possible by minimal ads and your gracious donation via PayPal (Credit Card) or Bitcoin (1M161JGAkz3oaHNvTiPFjNYkeABox8rb4g).

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.

Xmodulo © 2021 ‒ AboutWrite for UsFeed ‒ Powered by DigitalOcean