This free guide is written by Micah Wilson.
Don’t run from the Python!
A lot of people (including me until a couple of years ago) shy away from programming under the guise that it is too hard or time consuming to pick up. This guide will show you that programming basic scripts is simple and something that will benefit you a lot throughout your research career (and improve your interaction with computers more generally).
If you are a PhD student using a computer, then you are a programmer already! You are inputting commands (e.g. clicking analysis in SPSS, typing numbers or formulas), and expecting some kind of output (e.g. an ANOVA to run or a spreadsheet to calculate a formula). In fact, if you have ever used excel to make formulas, then you have already learnt the basics of a simple programming language! Many of you will be familiar with Syntax in SPSS, again this is just a simple kind of programming language 1. In its simplest form, programming is just telling to a computer how to perform actions by giving it instructions.
Quick Start Install
Python comes preinstalled on all Mac systems. To test this, open the progam called “Terminal” and then type in python
. You are now running the python programming language!
However, likely you want to do more than just run basic commands, and you might want to install modifications to your programming setup called modules
. We don’t want to modify the version that came with your operating system, so let’s install a new fresh version.
Go to https://www.python.org/downloads/ and install the Python 3.5.2
for your system. It will come with a program called Idle. You should start learning the latest version of Python for now.
Generally with Python, you write your code in a basic text file, and then execute that code using “terminal”. This execute may calculate a simple number or even run a game. However, for beginners, you can use IDLE which allows you to both write and run your code in the same interface!
Once you have all this set up you are all good to go and start programming.
Quick Start Resources
At this point, you may be asking yourself, what am I actually meant to do here? Well it’s a tough question. I suggest you start with some basic interactive tutorials (listed below) and once you have understood them, then move onto some of the more advanced tutorials (but less interactive). Which one of these you choose depends on your own style, but I reckon all of them are super good!
Interactive Tutorials:
CodeAcademy - a great starting resource, will take you through step by step and give you feedback along the way. I don’t think you need to finish the entire tutorial if you want to start going on your own path. It won’t be possible or useful to learn everything that python can do.
LearnPython - another good interactive tutorial
Advanced Tutorials
Google Code - another good website for learning how to program on your own computer. Better once you understand the basics.
Tutorials point- a good goto when you are confused
Offical Docs - a very readable guide
Useful Free Tools (Mac Users)
iTerm2
iTerm2 is a terminal replacement for mac. It has a lot more features than the basic terminal
Offical iTerm2 Website
Hack Font
A really good font for programming and plain text editing
Official Hack Repo
Oh My ZSH
By default the mac terminal runs on bash, which is fine for most uses. However, you can install Oh My ZSH. This is way more powerful and will make your life a lot easier.
Oh My ZSH Site
Atom Editor
Atom is a super powerful text editor. It is really good for programming Python with. I could write and send through an entire guide on just using Atom if this is something you are interested in
Atom Editor
First Steps
Programming boils down to two basic concepts: input and output. You
input data (e.g. text files, spreadsheets, lists of words, numbers,
keyboard presses), and then write code which tells the computer to
perform actions in order to output a desired result (e.g. a textfile
with certain words replaced, a message to appear, a reorganised
spreadsheet). Although some code can be very complex and hard to
understand at first; code can also be very simple, such as the Python
2.7 print command, print Hello World
which just prints that phrase,
Hello World
.
Installing Python
Before we start in any detail, I suggest you install Python 2.7 and follow the guide along.
To install Python, you need to visit https://www.python.org/ and download the version relevant for your operating system. You will have a choice between Python 3.X.X and Python 2.7.X. This guide assumes you are using Python 2.7.X and I recommend downloading that to start with.
There are a few important differences between the two versions to be aware of. Python 3.X.X is the latest release and will be continued to be developed for the foreseeable future, whilst Python 2.7.X development will stop at some point. There are a few differences in syntax between the two languages as well. However, importantly for you, Python 2.7.X currently has better third party library support, meaning it is less likely you will need to work out how to get an older version of code working.
You will also want to get a decent Python IDE (integrated development environment). This is the software where you can write and run your code. To start with, the IDLE editor that comes with Python will be fine. However, in the future you may consider getting the free (for researchers and students) IDE PyCharm https://www.jetbrains.com/pycharm/.
What is Python?
Python is one of the many programming languages out there. Python is a great first language to learn because it is simple, human readable, and quite powerful for most basic needs. Python is also rapidly becoming one of the most popular languages in the world, and this is a huge benefit for you. This means there is a big community of other Python users around the globe who have either written code to solve most problems you encounter (these are called Modules, which we will cover in a later section) or someone has experienced the same issue as you (and a simple Google search will reveal countless threads of people posting solutions).
The best way to learn Python is by doing. As such, this guide is only intended to supplement rather than replace other learning resources you find online. My main aim is to show programming is actually quite easy, and demonstrate a conceptual example of how you can use programming. I recommend completing at least the first few lessons from one of the following sites:
Basics
In this section, I will run you through six basic concepts of Python (you will likely already be familiar with these in some form). As we work through each one. Along the way we will build a working example of a simple data extraction script. For some concepts, I’ll just demonstrate its usages rather than work it into our example. Feel free to follow along on your own computer.
-
Operands
-
Basic Commands
-
Variables
-
Conditionals
-
Loops
-
Modules
Note: in all my examples, I use ; to represent my input, and use the ; represent the expected output from running the code.
Files, Printing and Operands
All Python files have the extension .py
. Try creating a blank text
document with the extension .py
and opening it with your IDE. Once you
have done this, you can enter in a basic command. The print
command
will print whatever string or variable you place after it. This will
come in handy when debugging and getting feedback from your code. Give
this a try now.
Just like excel, Python offers a bunch of operands to manipulate data with:
-3cm-3cm
Operand Description Example
- Addition Adds values. Also can join strings. a + b = 30
- Subtraction Subtracts values. a – b = -10
* Multiplication Multiplies values a * b = 200
/ Division Divides values b / a = 2
% Modulus
Divides values and returns remainder b%
a = 0 ** Exponent Performs exponential (power) a**b == Checks if both sides equal (then true). (a == b) is False.
These operands are useful for more than just manipulating numbers. For
instance, the +
operand can be used to join multiple strings
together:
; produces ;.
Comments
In python, you can comment your code with #
. In the code below, the
green text represent comments. These are not parsed (read) by the python
interpreter meaning you can comment scripts away without breaking your
beautiful code. It is very good habit to comment your code thoroughly,
because when you return from a long break - you may find it hard to
decipher what the code was intended for or how it works. It is also good
habit to place a space after the # sign
.
# This is a comment
Variables
Now we have our first script, let’s begin implementing some variables.
We know what variables are in psychology, they are something which we
measure/set and can vary. Similarly, in programming, variables are
things we can set or change. Variables can be static and set by the
user, or can be dynamic and update based on calculations. In python we
define variables with a name and an =
sign. For example:
# Define variables about Sam
id_var = "Sam Smith"
gender_var = "Male"
age_var = 22
print id_var,gender_var,age_var
# Print this same information in a more human readable format
print "Participant:",id_var,"Age:",age_var,"Gender:",gender_var
Would produce this output:
Sam Smith Male 22
Participant: Sam Smith Age: 22 Gender: Male
Variables need not just be numbers or strings. Variables can be a range of other objects. For instance, a list of objects. We can define a variable as a list3 in one of two ways:
# Define a list of variables, method 1.
sam_reaction_times = [124, 321, 632, 231, 321, 452, 123, 123]
# Define a list of variables, method 2.
sam_reaction_times = list(124, 423, 632, 231, 321, 452, 123, 123)
# Print second reaction time in list
print sam_reaction_times[1]
423
What you may have noticed is that we called a specific item in the list
sam_reaction_times
. Python uses a zero based ordering system, which
means that if we wanted to call the first item in the list we would need
to call sam_reaction_times[0]
. Note that if you define a variable a
second time, it will overwrite the data from the first instance.
We can also define list of lists (and list of list of lists of lists if you wanted to) by simply adding a list within a list. Note, however, when you do this you now require two indexes to call the list you want.
sam_reaction_times = [[124, 321, 632],[231, 321, 452],[341, 123, 123]]
print sam_reaction_times[0]
print sam_reaction_times[0][0]
> [124, 321, 632]
> 124
Conditionals
Conditionals are simple logical statements we can use to trigger actions
or events if certain conditions are met. Python’s two Boolean
(True/False) values are True
and False
(case sensitive). Let’s test
out a few expressions:
5 == (3 + 2)
3 == 3
2 == 1
"Sam" == "Smith"
"Sam" == "Sam"
> True
> True
> False
> False
> True
I mentioned we can also perform specific actions if
certain conditions
are met. if
statements allow you to do this by testing certain
criteria, and then depending on the outcome, performing certain actions.
Using elif
you can specify additional clauses. Otherwise, you can
simply use else
to do something when a criteria is not met.
At this stage, it is important to note that Python uses white space for flow control. In simple terms, this means that when you are instructing your program to do a series of commands, the tab stop will tell Python what order to do these actions in (and which actions are related to what conditions.)
Let’s start with a simple example of printing the word True
if 1 is
equal to 1.
if 1==1:
print "True"
> True
Great, everything working as intended. However, if we remove the tab stop before the print command, the program will not work. This is because (as can be read in the error message below) Python expects something to happen on the line after an if statement with a tab stop to denote it is part of the conditional.
if 1==1:
print "True"
> File "", line 2
> ? print "True"
> ^
> IndentationError: expected an indented block
In the example below I introduce a few new commands you may not have seen before. Instead of telling you what they are, I encourage you to Google it. This way, you will begin to learn how to read other peoples scripts and how to search for solutions yourself.
# defines a list of reaction times
reaction_time = [332,223,432,642,231]
# Logical tests
if len(reaction_time) == 5 and reaction_time[0] > 1:
print "Valid number of reaction times and first time is valid"
elif len(reaction_time) < 5 and reaction_time[0] < 1:
print "Wrong number of reaction times, first time is invalid"
else:
print "Big Error: an invalid participant"
> Valid number of reaction times and first time is valid
Intro to Loops
Example Program
Before I get into any more detail, I’m going to present to you a simple (but time consuming) problem, and show you how Python could solve this in seconds. Imagine you have a folder containing 400 text files which contain qualitative responses to three questions about Star Wars. This file may look something like the following:
PARTICIPANT 1, MAY 21ST 2015 3:41PM
Q1: What is your favourite Wars Movie?
I like The Phantom Menace the best because Jake Matthew Lloyd is such an amazing actor.
Q2: How do you find the recent Star Wars marketing affected your purchasing habits?
I ate wayyyy more star wars oranges. Star Wars oranges are so much better than navel.
Q3: What Star Wars product would you like to see availble?
Star wars shampoo would be great. Luke always had such good hair.
If you wanted to extract the answers to each of these questions by manually copy and pasting the answers from all of them into a spreadsheet, it would take you hours. However, with Python, we can just tell Python to do that copy pasting for us.
Modules are little packages of code that people have written for you
(for free). Virtually all you programming will require the use of
modules. For example, if you just wanted to pick a random item from a
list like this:
[animals = [dogs, cats, chickens, goats, Darth Vader]]
, then all you
need to do is call up the random module’s choice method like this:
random.choice(animals)
.
import os#this imports a module called os which
Let's start our first script by created a commented header the describes the file.
-
Some people get picky, and call programming languages like Python, Scripting Languages because you do not need to recompile the python command file for every different computer you want to use it on, like you do for languages like C++. However, I am going to ignore these nerds and use the term “programming language” as I please. ↩︎
-
Note: Software Carpentry is a great resource for new programmers. However, it is a little more advanced and assumes more knowledge than this guide or the above links. ↩︎
-
A list (or in some languages, an array) is a list of objects. Lists items in Python are separated with a comma. Understanding the power and usage of lists will be critical in getting the most out of Python. ↩︎