Python Basics

Marcel Scharth (The University of Sydney Business School)


This tutorial is an introduction to the essentials of the Python programming language for business analytics students. I assume that you followed the instructions for installing Python and have your notebook ready to go.

1. Getting Started

To get started, you can use your notebook as a calculator. For example:

In [1]:
2 + 2
Out[1]:
4
In [2]:
3/2
Out[2]:
1.5

Exercise: identify the use of following arithmetic operators: +,,,/,∗∗,%.

The following statement assigns a value to the variable x. Because the variable does not yet exist, the assignment statements creates the variable.

In [3]:
x = 5
x
Out[3]:
5
In [4]:
x + 2
Out[4]:
7

Exercise: identify what the syntax x += 2 does (we say that += is an assignment operator).

The print function allows you to display output.

In [5]:
print('For truth is always strange; stranger than fiction.') # Lord Byron (the # starts a comment)
For truth is always strange; stranger than fiction.
In [6]:
x = 10
print(x)
10

The print function is a built-in function which is part of the core of the Python programming language. Another example of a built-in function is abs, which computes the absolute value of a number.

In [7]:
abs(-2)
Out[7]:
2

2. Modules

The Python language by design has a small core. Mst of the fuctionality that we need is in modules or packages that we need to explicity load into our session. There are two ways to do this: either by loading the entire modulue (or a submodule) or a specific function that we need.

In [8]:
import math
math.sqrt(4)
Out[8]:
2.0
In [9]:
from math import sqrt 
sqrt(4)
Out[9]:
2.0

We will use a number of different Python libraries thoughout this course, including Pandas (data processing), Matplotlib (plotting), Seaborn (to make plots elegant), StatsModels (statistics), NumPy (scientific computing), and Scikit-Learn (machine learning).

3. Data Types

3.1 Boolean variables

The most basic data type is a Boolean variable, which can be either True or False.

In [10]:
x = False
print(x)
False
In [11]:
x = 2 > 0
print(x)
True

Exercise: identify the use of following comparison operators: ==, !=, >=, <=.

Exercise: explain the code below in detail. You may want to break this down into four steps.

In [12]:
x = 3 % 2 == 0
print (x)
False

In numerical expressions, a False is automatically converted to zero and a True is converted to one. For example:

In [13]:
x= True
y = 2*x
print(y)
2

3.2 Numbers

There are two main built-in numerical data types, signed integers (int) and floating point real values (float).

In [14]:
x = 1 
type(x)
Out[14]:
int
In [15]:
x = 1.0
type(x)
Out[15]:
float

Sometimes, we need to do a explicit type conversion (typecasting), as the next example shows.

In [16]:
a = 2
x = a > 0
y = int(x)
print(y)
1

3.3 Strings

String variables represent text data.

In [17]:
sentence = 'For truth is always strange; stranger than fiction.' 
type(sentence)
Out[17]:
str

As we are going to see in a text analytics application, Python has sophisticated capability for string manipulation.

4. Data Structures

In computer science, a data structure is a way to store and organise data for efficient retrieval and modification. The four basic Python data structures lists, dictionaries, tuples and arrays. We introduce the first three in this section.

4.1 Lists

A list is a sequence of values. The values in a list, known as elements or items, can be of any type. To create a list, we enclose the elements in brackets [ ].

In [18]:
a = [ ] # empty list
b = [1, 2, 5, 10] # list of four numbers
cities = ['Sydney', 'Melbourne', 'Brisbane']
c = [2, 4, 'Sydney'] # list mixing different variable types.

There are several list methods and operations that you should be familiar with. The append method inserts a new element to the end of the list.

In [19]:
cities.append('Perth')
print(cities)
['Sydney', 'Melbourne', 'Brisbane', 'Perth']

The len function counts the number of items in a list. It also works for counting the number of items in other types of containers.

In [20]:
len(cities)
Out[20]:
4

We retrieve elements by passing the numerical index. What is crucial for you to know is that numerical indexes start from zero in Python. Here are some examples:

In [21]:
cities[0] # first element
Out[21]:
'Sydney'
In [22]:
cities[2] # third element
Out[22]:
'Brisbane'
In [23]:
cities[-1] # last element
Out[23]:
'Perth'

Often, we need to retrieve a slice of a list. This can be a bit confusing initially, so here are several examples.

In [24]:
cities[:2] # first two elements/all elements up to index 1 
Out[24]:
['Sydney', 'Melbourne']
In [25]:
cities[1:3] # elements in indexes 1 to 2 (the element in index 3 is not part of the slice)
Out[25]:
['Melbourne', 'Brisbane']
In [26]:
cities[1:] # all elements from index 1 onwards
Out[26]:
['Melbourne', 'Brisbane', 'Perth']
In [27]:
cities[-2:] # last two elements
Out[27]:
['Brisbane', 'Perth']

The + operator concatenates lists.

In [28]:
a = [1, 2]
b = [3, 5, 10]
a + b
Out[28]:
[1, 2, 3, 5, 10]

The in expression allows to check if a certain item is present in a list.

In [29]:
'Sydney' in cities
Out[29]:
True
In [30]:
'Copenhagen' in cities
Out[30]:
False

It is also useful to know how to sort lists. The sorted function will return a sorted copy of an object.

In [31]:
sorted(cities)
Out[31]:
['Brisbane', 'Melbourne', 'Perth', 'Sydney']

In contrast, the sort() method will modify the list itself by sorting it.

In [32]:
print(cities)
cities.sort()
print(cities)
['Sydney', 'Melbourne', 'Brisbane', 'Perth']
['Brisbane', 'Melbourne', 'Perth', 'Sydney']

4.2 Dictionaries

A dictionary is a collection of key-value pairs. We create a dictionary by providing the key-value pairs within curly brackets { }. For example, in the dictionary below the keys are the names of the cities and the values are the population of each city.

In [33]:
population = {'Sydney': 5230330, 'Melbourne': 4936349, 'Brisbane' : 2462637}

We retrieve a value by referring to the key.

In [34]:
population['Sydney']
Out[34]:
5230330

Another way to create a dictionary is as follows.

In [35]:
address = {} # empty dictionary
address['country'] = 'Australia'
address['state'] = 'NSW'
address['postcode'] = 2006
print(address)
{'country': 'Australia', 'state': 'NSW', 'postcode': 2006}

4.3 Tuples

A tuple is an immutable list: we can neither modify the elements of a tuple nor insert or remove items from it. We usually create a tuple by enclosing the elements in parentheses ( ).

In [36]:
a = (1, 2, 'cat', 'dog')
print(a)
(1, 2, 'cat', 'dog')

It's also possible to create a tuple without the parentheses in the syntax, though this can make the code less clear.

In [37]:
a = 1, 2, 'cat', 'dog'
print(a)
(1, 2, 'cat', 'dog')

A useful operation is tuple unpacking, shown in the next two examples.

In [38]:
numbers = (1, 2)
a, b = numbers
print(a)
print(b)
1
2
In [39]:
[*numbers, 3]
Out[39]:
[1, 2, 3]

5. For Loops

Often, we need to traverse a list and run code that takes each item as an input. We use a for block to do this.

In [40]:
cities = ['Sydney', 'Melbourne', 'Brisbane', 'Perth']

for city in cities:
    print(city)
Sydney
Melbourne
Brisbane
Perth

There are two important details to note in this syntax. The for loop would work with any alias instead of city, as long as we use it consistently. However, we say that choosing a meaningful alias makes the code more Pythonic (clean and readable).

Each iteration of the loop will repeat the code in the indented part of the block, below the for statement. In order for the syntax to be correct, the indentation needs to be four spaces. The editor adds it automatically.

Here's another example.

In [41]:
numbers = [1, 2, 5, 10]

for number in numbers:
    x = number**2
    print(x) 
    
print('The code then continues from here') # outside the for block
1
4
25
100
The code then continues from here

For loops are applicable to any iterable objects. We commonly write loops over a numerical range, as the next two examples show.

In [42]:
for i in range(3):
    print(i)
0
1
2
In [43]:
for i in range(1, 11, 2): # starts at 1, ends before 11, step size 2
    print(i)
1
3
5
7
9

The enumerate function is useful for obtaining an indexed list:

In [44]:
cities = ['Sydney', 'Melbourne', 'Brisbane', 'Perth']

for i, city in enumerate(cities):
    print(f'City {i}: {city}')
City 0: Sydney
City 1: Melbourne
City 2: Brisbane
City 3: Perth

6. Functions

In programming, a function is a piece of code that (optionally) takes inputs, performs a set of instructions, and (optionally) returns an output.

In [45]:
def square(x):
    return x**2

y = square(4)
print(y)
16

Here's an example of a function that has no input or output.

In [46]:
import time 

def today():
    date = time.strftime("%d/%m/%Y")
    print(f'Today is {date}')
    
today()
Today is 13/01/2020

When calling a function, we can use positional and keyword arguments. In this next example, we use positional arguments only, which means that Python will assign 2 and 3 to parametersx and p respectively.

In [47]:
def power(x, p):
    return x**p

y = power(2,3)
print(y)
8

The next example does exactly the same, but based on keyword arguments.

In [48]:
y = power(x=2,p=3)
print(y)
8

When using keyword arguments, the inputs do not need to be in any particular order.

In [49]:
y = power(p=3,x=2)
print(y)
8

We can also mix positional and keyword arguments, but in this case the positional arguments need to come first.

In [50]:
y = power(2, p=3)
print(y)
8

Many functions that you will be using have default arguments. It's important for you to pay attention to these default values and ask if they make sense for your current application.

In [51]:
def hello(name='user'):
    print(f'Hello {name}!')

hello('John')
hello()
Hello John!
Hello user!

7. If Statements

An if statement evaluates if an expression is True or False, and executes different code accordingly. For example, suppose that we want to code a function to calculate the absolute value of a number, defined as

\begin{equation} |x|=\begin{cases} x & \text{if $x\geq0$}\\ -x & \text{if $x<0$}. \end{cases} \end{equation}
In [52]:
def absolute(x):
    if x >= 0:
        return x
    else:
        return -x

y = absolute(-2)
print(y)
2

As another example, below we code a function that raises a customised error message if the input is invalid.

In [53]:
def log(x):
    if x <= 0:
        raise ValueError('Wake up mate!! The log of zero or a negative number does not exist.')
    else:
        return math.log(x)

log(2)
Out[53]:
0.6931471805599453

Now, try taking the log of zero and see what happens.

8. Arrays

NumPy is the fundamental package for scientific computing in Python. NumPy arrays are data structures that we use to represent and store vectors and matrices. This is very important to us, because all learning algorithms in this course are based on operations on vectors and matrices which typically happen behind the scenes.

For example, consider the following vector.

\begin{equation} a = \begin{pmatrix} 5 \\ -2 \\ -3\end{pmatrix} \end{equation}

We represent this vector in Python as follows.

In [54]:
import numpy as np

a = np.array([5, -2, -3])
a
Out[54]:
array([ 5, -2, -3])

In the same way, we use a two-dimensional NumPy array to represent a matrix. Consider for example the following square matrix.

\begin{equation} B=\begin{bmatrix} 1 & 2 \\ 5 & -4 \\ \end{bmatrix} \end{equation}

We create it as follows:

In [55]:
# the input is a list, where each item is itself a list representing a row 
B = np.array([[1,2],[5,-4]])
B
Out[55]:
array([[ 1,  2],
       [ 5, -4]])

The ndim and shape properties allow us to retrieve the number of dimensions and the dimensions themselves of Numpy arrays.

In [56]:
a.ndim # a is a one-dimensional array 
Out[56]:
1
In [57]:
B.ndim # B is a two-dimensional array 
Out[57]:
2
In [58]:
a.shape # a has three elements along the first and only dimension
Out[58]:
(3,)
In [59]:
B.shape # B has 2 rows (first dimension) and two columns (second dimension)
Out[59]:
(2, 2)

Retrieving elements and slices works similarly to what we do for lists, except that we need to keep track of multiple dimensions. Here are some examples.

In [60]:
B[1,0] # element in the second row, first column
Out[60]:
5
In [61]:
B[:,1] # second column
Out[61]:
array([ 2, -4])
In [62]:
B[-1,:] # last row
Out[62]:
array([ 5, -4])

9. Programming Tips

  1. Programming is a completely logical process. Your code will only generate the correct result if it is entirely correct, both in terms of the syntax and the logical consistency of what you are trying to do. Otherwise, you get an error message or the wrong result. This will happen all the time, whatever your level of programming ability. Just go back and find out where the problem is. Troubleshooting is a very important skill.

  2. Read the error messages. Students very often ask why their code is not working when the error message already says what the problem is.

  3. Not even a full unit on programming would be able to cover all scenarios. You should use the package documentation and internet searches frequently to find out how to do things and fix problems. It's important to conceptualise and articulate clearly what you're trying to do, then you discover how to implement that in Python.

Formatting

The two cells below format the notebook for display online. Please omit them from your work.

In [63]:
%%html
<style>
@import url('https://fonts.googleapis.com/css?family=Source+Sans+Pro|Open+Sans:800&display=swap');
</style>
In [64]:
from IPython.core.display import HTML
style = open('css\jupyter.css', "r").read()
HTML('<style>'+ style +'</style>')
Out[64]: