Python for Business Analytics
The University of Sydney Business School
Welcome to the Python resource page for Business Analytics students at the University of Sydney Business School. The purpose of this page is to provide key information and supplementary material for students using Python in Business Analytics units.
- Setting up Python
- Getting started with Python and Jupyter Notebook
- Other environments
- Resources for learning Python
Setting up Python
You can easily and quickly install Python on your personal computer. Python is free, open source, and you do not need a license to work with it even if you use it for commercial purposes. This flexibility is one of main reasons why Python is so popular as a general purpose language, and why Python has become one of the most popular languages for data science (together with R).
Our units use the core Python installation plus a collection of libraries to support tasks such as scientific computing, data management, data visualisation, statistical analysis, and machine learning. You can get almost everything that we need in one go by downloading and installing the Anaconda distribution provided by Continuum Analytics. Follow the instructions on their website and install the latest version, which is currently Python 3.7.
To interact with Python, we use Jupyter Notebook, a browser-based interface that is simple to use and has many useful features. Jupyter is included in the Anaconda distribution. You can also use Jupyter Notebooks without any installation through Kaggle Kernels, which is a cloud environment built for data science.
Anaconda also works as a convenient package manager. To install and update packages, open Anaconda Prompt (follow the link for instructions in case you are not sure how do this). Once you open the prompt, try the command:
If you are taking one my courses on statistical learning or machine learning, please follow these instructions to install additional required packages.
Some students prefer to manage packages from the Anaconda Navigator.
Getting started with Python and Jupyter Notebook
On Windows, you can can launch Jupyter notebook on your default browser from the start menu as follows:
Alternatively, you can open Anaconda Prompt or Terminal (macOS/Linux) and enter:
You'll probably want to change the start-up folder. What I like to do personally is to create a different folder and Jupyter link for every new project. If you wish to follow this, then you'd create a folder and link for your current course at this stage.
Once Jupyter opens up, you will see the Jupyter dashboard as below. In the main body you will see all files in the directory from which you launched Jupyter. On the top right, you can click on new and then Python 3 to open a new notebook.
You are now ready to start coding. The basic elements of Jupyter Notebook are the cells, as in the next figure. Each cell holds is interpreted as code by default. You can type (or copy and paste) as much code as you like in a single cell. Press Shift + Enter when you are ready to run it. As a first step, try using a cell as a calculator to see it working. You can always run a cell again.
You should familiarise yourself with the top menu to get an overview of the basic functionality of the notebook and useful keyboard shortcuts. For example, in the drop down list where you initially see "code" you have the option of changing a cell to markdown, so that you can write notes (alternatively Esc + M). Markdown cells accept HTML code and mathematics typed in LaTeX, which will be rendered when you run the cell. For practical information you can consult this cheatsheet.
As a practical step, try to run the code below. It loads the pandas package for data management. From previous experience, an error may arise in some international computers due to font encoding issues. This is a minor fixable problem. One solution for Mac computers is in this Stack Overflow thread. There may be simpler ones but you would need to bring your computer to me. Please let me know if the problem persists so that I can help you. It is important that you do this before the first face-to-face session.
import pandas as pd
To get an idea of we can do with this environment, run the following code snippet, which is an illustration from the Seaborn documentation.
%matplotlib inline import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt def sinplot(flip=1): x = np.linspace(0, 14, 100) for i in range(1, 7): plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip) import seaborn as sns sns.set_style('whitegrid') sinplot() plt.show()
If it all goes well, you will see a figure like the one below. Do not worry about understanding the details for now, you will have mastered all this by the end the unit. It is however useful to note that the first line makes the figure appear on the notebook itself rather than on a separate window. Once you run it once it will keep doing so unless you revert to the default backend by running "%matplotlib qt" or restarting the notebook. Try experimenting with this.
This section is optional and more helpful after you get some experience with Python and Jupyter Notebook.
Even though Jupyter Notebook is an excellent interface for learning Python and a great tool for communication that is widely adopted by professionals, it may not fulfill all your needs as you become a more advanced user.
An alternative workflow is to combine a text editor for writing code with a console to run it and work interactively with Python. For example, you may write a set of Python instructions in a text file know as a script (.py extension in our case), and use the console to run it and see the output. As the name suggests, an integrated development environment (IDE) combines both of these functionalities into one environment. VSCode has been gaining traction as a great IDE option for Python.
The other option is to use a separate console and text editor. The Jupyter Qt console is a good option that already comes with the Anaconda installation. You can either find it on Anaconda Navigator or open it by typing the following on a Anaconda Prompt or Terminal:
I like to tweak this slightly to work with a dark background and default to a larger font. As before, you can create a link for convenience.
You can try to copy and paste and run the code from before on the Qt console. A key difference is that you just have to press Enter to run single line commands (but still Shift + Enter for multiple lines).
At a very basic level, even Notepad works as a text editor for coding. But you would want a more sophisticated option with features such as syntax highlighting and auto indentation. I use Sublime Text, but it requires a license (even though it has an unlimited trial period). Atom is a good free choice. A simple alternative, if you are just getting started, is to use Jupyter Notebook as a text editor. You can do this by creating a text file instead of a Python notebook on the main dashboard. You should then specify Python as the language on the menu.
There often too many choices when it comes to Python. This is a good thing as it reflects the enthusiasm of developers in creating tools and packages, but can be distraction and even a barrier for beginners (compared to R for example, for which R Studio is a clear cut IDE choice). I suggest that you follow the recommended setup without worrying too much about the details. If you are coding and running programs, then you are making progress.
Resources for learning Python
Even though I provide some basic tutorials above, I recommend that you learn Python more systematically. Here are some great options:
Kaggle Learn has free online micro-courses oriented towards data science.
Quantitative Economics online lectures by Thomas J. Sargent and John Stachurski. A fast way to get started with Python for those who already have experience with other languages such as MATLAB or R (some knowledge of econometrics is helpful).
Think Python: How to Think Like a Computer Scientist (Second Edition) by Allen Downey (free online text). For students who are new to coding and are interested in developing a more fundamental understanding of programming.
It is important to note that no activities in Business Analytics units are intended as a replacement for a programming unit (I've advocated for such an unit to be offered as part of the Business School curriculum, but no luck). Our focus in on using Python as a practical tool for data science, but there is much more to programming. Students who plan to code professionally are encouraged to study programming at the School of Information Technologies.