Sequences
Learning Objectives
- Identify and use different kinds of basic sequences: strings, ranges, lists and tuples
- Understand the limitations and restrictions of each type of sequence
- Understand the difference between mutable and immutable sequences
- Use an index to access a value in a sequence
- Use slicing to obtain a subsequence
Overview
There is at times a magic in identity of position. -- E.M. Forster, A Room with a View
She knew all the indices to the idle lonely. -- Joan Didion, Play It as It Lays
Our data types so far have been quite lonely: a single number as an int
or a float
, or a single truth value as a bool
. We can build interactive programs with just a few numbers to describe a square, but we find ourselves in a position where we have to declare a new variable for each individual value that we want to store and manipulate. In this section, we'll learn about new sequence data types that serve as containers for multiple pieces of information. Our introduction to sequences will come with a closer look at a familiar data type: str
.
Strings as Sequences of Characters
Strings are Sequences
As we learned before, the str
data type is how we represent text in Python. We can create a string by writing out a literal as a bunch of characters placed between a pair of the quotation marks of your choice:
vocabulary_word = "vermiculate"
A string is defined not just by the characters it contains, but by the order in which those characters are stored. The words relatives and versatile are anagrams of each other—they contain the same characters—but they are not the same words and they are not the same strings!
a = "relatives"
b = "versatile"
print(a == b) # prints False!
In Python, a sequence is a kind of data type that is capable of storing several pieces of information in a specific order. We see that a str
value fits this description: it is capable of storing arbitrarily many characters and it does so in a specific order. We are limited to storing just one kind of data in a string (characters), and we'll see that this is actually restrictive compared to other sequence types in Python.
Since strings know the order of the characters that make them up, we can actually access individual characters one at a time using indexing.
Strings are Indexable Sequences
A sequence is indexable in the sense that we can refer to its first value, its second value, and so on, all the way up to its final value. In Python, the indices that we use start at 0
(not 1
), meaning that when we diagram the index of each character in a string, it looks like this:
"indexing"
01234567
Notice that indexing is a word with eight letters (that is, len("indexing") == 8
is True
) and since we start counting at 0
, the index of the last character is 7
. That difference between the length of a string and the index of its last element is present in strings of different lengths:
"short" # 5 characters long
01234 # biggest index: 4
"lengthy" # 7 characters long
0123456 # biggest index: 6
In general, for a string and for all sequences, the index of the n
th element is n - 1
. The first element lives at index 0
, the fourth element lives at index 3
, the 653rd element lives at index 652
, and the last element in a string of n
characters lives at index n - 1
.
Now that we've belabored the point 1000 times (of which the final time would be found at index 999
...), we can confidently move on to the indexing operator in Python. For a sequence s
of any type, you can access the element at position i
inside of that sequence by writing s[i]
.
For example, in "Travis Q. McGaha"
, the Q
is the eighth letter and so it lives at index 7
. Therefore, to pull that character out of the larger string, we can do the following.
full_name = "Travis Q. McGaha"
middle_initial = full_name[7]
print(middle_initial) # prints 'Q'
It's worth noting that the type of middle_initial
in this case is still str
—individual characters in Python still count as strings themselves. In any case, we could expand the example to get Travis' first and last initials too:
full_name = "Travis Q. McGaha"
middle_initial = full_name[7] # Q
first_initial = full_name[0] # T
last_initial = full_name[10] # M
The indices that are valid for a sequence of length n
always range from 0
to n - 1
. An empty sequence, like the empty string ""
, is one that has a length of 0
and therefore has no valid indices to speak of.
If you try to access an index that is not valid (because it is too big), you will crash your program with an IndexError
:
>>> "HSS"[100]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
An interesting feature is that python also allows for negative indexes: my_seq[-1]
would be the last character of the sequence my_seq
. For a sequence of length n
, the allowable negative indices range from -1
to -n
:
>>> "miso"[-1]
'o'
>>> "miso"[-2]
's'
>>> "miso"[-3]
'i'
>>> "miso"[-4]
'm'
However, an index that is 'too negative' will still result in an IndexError
:
>>> "miso"[-100]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
Strings are Concatenatable Sequences
Since each initial is just a str
, we can concatenate them all together using the +
operator. This is actually another common feature of Python sequences: two sequences of the same type can usually be concatenated together. (There are some exceptions.)
full_name = "Travis Q. McGaha"
middle_initial = full_name[7] # Q
first_initial = full_name[0] # T
last_initial = full_name[10] # M
full_initials = first_initial + middle_initial + last_initial
print(full_initials) # prints "TQM"
Strings are Sliceable Sequences
Now that we are more comfortable with sequences and their indices, we can interrogate a claim recently made to me by a bumper sticker on a passing car: that there is no earth without art. I absolutely agree with the implied argument that there's not much point in being here on this planet without being able to express ourselves creatively. But perhaps there's something more literal here to investigate:
"earth"
12345
If we look at the indices of the characters in the string "earth"
, we can see that the bumper sticker was also correct in a sequency sense: characters 2
through 4
of "earth"
do literally spell out "art"
. Pulling one sequence out of another is something that we'll often want to do in our programs—not just when we have a pun that we want to belabor—and Python has a syntax for doing it. In the context of strings, if we want to obtain a subsequence of a string s
including all characters starting at index i
and stopping before index j
, then we can do that by writing s[i:j]
. For instance:
print("earth"[1:4]) # prints "art" 🖌️
print("earth"[0:3]) # you also can't have "earth"
# without "ear" 👂
Notice that in each case we are excluding the character at the end position. "earth[1:4]"
gives "art"
, which is the subsequence consisting of characters at positions 1
, 2
, and 3
only. This takes some getting used to.
A positive feature of including the starting index but excluding the latter is that you can pretty easily calculate how many characters you're getting: s[i:j]
will always have a length of i - j
characters.
Another implication of this design choice means that if you want to get a subsequence containing all characters including the last one, the stopping index is one larger than the highest index of any element actually present in the sequence:
title = "crossroads"
# all three examples below give exactly the same value
roads_one = title[5:10]
roads_two = title[5:len(title)]
roads_three = title[5:]
print(roads_one) # prints "roads"
print(roads_one == roads_two == roads_three) # prints True
This last version—title[5:]
—is a useful syntactical shorthand for getting all characters in title
starting at position 5
and going all the way to the end of the string. In fact, we can do something similar for shorthand when taking all characters up until a certain position:
title = "crossroads"
# both examples below give exactly the same value
cross_one = title[0:5]
cross_two = title[:5]
print(cross_one) # prints "cross"
print(cross_one == cross_two) # prints True
It's also possible to take non-contiguous slices—slices that skip over a fixed number of elements between selections from the larger sequence. Thus, the full slicing syntax emerges:
If you want every
k
th element of a strings
starting at indexi
and ending at indexj
, you can write:
s[i:j:k]
Broadly, the "formula" is [start:stop:step]
. Let's take a look at an example. If we want to obtain just "BC"
as a slice, we can write the following expression:
>>> interwoven_alphabet = "AaBbCc"
>>> interwoven_alphabet[2:5:2]
'BC'
How do we get "BC"
? Writing down the indices of the string helps to illuminate:
AaBbCc
012345
We start taking characters at position 2
based on the start
. That's "B"
. Then, we take a step of 2
over to position 4
. That's "C"
. We take one more step of size 2
over to position 6
—but 6
is greater than 5
, which is our stopping position. We take "B"
and "C"
and nothing else, making our slice.
Take another example:
>>> interwoven_alphabet[0:6:3]
'Ab'
We'll start at position 0
, taking "A"
. We'll take a step of 3
over to position 3
and take "b"
. We'll take another step of 3
over to position 6
, which is our stopping index. We're done, and "Ab"
is the output slice.
We can extend this even further with negative step sizes: it gets a bit tricky, but it can be useful. In these cases, we'll actually have start
values that are greater than our stop
values; this works since we'll be stepping backwards from start
to stop
:
>>> "devolve"[4:0:-1]
'love'
We start at position 4
, which is "l"
. We take a step of -1
to position 3
, which is "o"
. Another step of -1
to position 2
, which is "v"
. Stepping by -1
to position 1
gives us e
, and then we take one more step of -1
over to position 0
, which is our stop
position. We're done, and "love"
is the result. Awww 💖.
Taking slices in reverse is a little tricky to get the hang of. For example, there's a big difference between "devolve"[3:0:-1]
and "devolve"[3::-1]
—try it out! This is mostly a niche application anyway, but it's worth bringing up for an idiom that comes in handy from time to time. A slice of [::-1]
is shorthand for reversing a sequence.
>>> "evol"[::-1]
'love'
>>> "pots"[::-1]
'stop'
Membership in Strings
Strings support the use of the in
keyword to ask if one string is a subsequence of another string. In particular, we know that we can find "art" in "earth"
—that expression evaluates to True
—but "at" in "earth"
is False
. For two strings s
and t
, s in t
is True
when you can find s
as an uninterrupted sequence of characters in t
. Some corollary properties:
- if
len(s) > len(t)
, thens in t
is alwaysFalse
s in s
is alwaysTrue
"" in t
is also alwaysTrue
Immutability and Strings
While we can index, slice, concatenate and do many other operations with strings, we cannot actually modify them. In technical terms, the str
datatype is said to be immutable.
But wait, I can set my variable to "abcd"
, then modify it to "AB"
and it works! Isn't that changing the string?
No, it actually is not! Consider the following code samples:
Valid code
>>> my_var = "abcd"
>>> print(my_var)
abcd
>>> my_var = "New string"
>>> print(my_var)
New string
Invalid code
>>> my_var = "abcd"
>>> print(my_var)
abcd
>>> my_var[0] = "d"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
The difference is very subtle, but crucial for understanding immutability:
- In the first sample, we are replacing the string stored in
my_var
(namely"abcd"
) with another string (namely"New string"
). We are modifyingmy_var
, and Python is happy. - In the second sample, we are trying to modify the actual string stored in
my_var
by trying to change its first character, and Python throws an error.
Expressed in another way, we are allowed to reassign variables that store strings. But we cannot modify the contents (i.e. characters) of the stings themselves!
From this, you can understand that all of the operations we do on strings don't modify the string, but actually give you a new, possibly different string:
>>> string_one = "abcd"
>>> print(string_one)
abcd
>>> string_two = string_one[1:2]
>>> print(string_one)
abcd
>>> print(string_two)
b
Strings are not the only immutable data type - ranges and tuples are also part of this club, as you will see below.
Taking Inventory
A string's identity is based on the characters it contains and the order in which those characters belong. The ordering of the characters allows us to count them, starting from 0
, using indices. We can use indexing to query a string for a character stored at a particular position. We can extend indexing to slicing by defining a range of indices that we want to pull characters from. And we can check for membership inside of strings using the in
keyword, deciding whether one string is present inside of another. Each of these properties of a string is held in common with the other sequences we'll introduce next.
Ranges
If str
is the immutable datatype for sequences of characters, then we can think of the range
as the corresponding immutable type used for sequences of numbers. A range is an ordered sequence of numbers defined by a start point, stop point, and step size. Whereas a string can feature characters in any order, ranges are more narrowly constrained. They are defined by a start
, stop
, and step
parameter. (Sound familiar?)
Creating Ranges
range(n)
The simplest ranges can be created by omitting both the start
and step
, which will default to 0
and 1
, respectively. Writing range(10)
gives us a sequence of all int
values from 0
up until 9
. range(100)
is a bit bigger, containing all numbers from 0
up until 99
. A smaller range might be range(3)
, containing just 0
, 1
, and 2
. In each case, when creating a range from 0
to some stop value of n
, the resulting range is a sequence of n
different numbers in ascending order.
Stopping & Stepping
Like slices, ranges can be customized more fully using the start
and step
. Also like with slices, we can provide a negative step
size to count down from our start
to our stop
.
Contents | Expression |
---|---|
0, 1, 2, 3, 4 | range(5) |
1, 2, 3, 4, 5 | range(1, 6) |
1, 3, 5 | range(1, 6, 2) |
0, 10, 20, 30, 40, 50 | range(0, 51, 10) |
empty! | range(6, 0) |
6, 5, 4, 3, 2, 1 | range(6, 0, -1) |
The procedure for determining the contents of a range from its start/stop/step
is the same as before: our first number will be start
, and we'll step
by a fixed amount until we cross over to the other side of stop
.
Have a hard time remembering your times tables? You can list all multiples of a number in a certain range using the step
parameter. range(0, 100, 9)
gives all multiples of 9
between 0
and 99
, whereas range(0, 100, 13)
does the same for multiples of 13
.
Lists
While ranges are very useful for iteration (see the chapter on loops), they are not great for storing data. What if you wanted to remember all the PennIDs in CIS 1100? Or the tasks you have to complete tomorrow? Or the first 5000 prime numbers?
Lists are the answer! Simply put, a list is a mutable, ordered collection of things. You can add things to it, you can remove things from it, you know if an element comes before or after another, and the elements inside a list don't even have to have the same data type.🤯
Creating a list
The simplest of lists is the empty list, denoted as a pair of square brackets []
with nothing in between. If we want a list that contains the number 1, we would define it as [1]
. If we want a list that contains multiple elements, we separate them by commas: [1, "Harry", 50]
.
>>> empty_list = []
>>> print(empty_list)
[]
>>> one_elem = [1]
>>> print(one_elem)
[1]
>>> multiple_elems = [1, "Harry", 50]
>>> print(multiple_elems)
[1, 'Harry', 50]
Working with a list
Basic sequence operations
Lists support most of the operations that strings support. Consider the list my_list = ["a", "b", 23, 55, 70]
for the following examples:
- Getting the length:
len(my_list) # evaluates to 5
- Getting the i-th element:
my_list[2] # evaluates to 23
- Slicing:
my_list[1:3] # evaluates to ['b', 23]
- Concatenation:
my_list + [22, "a"] # evaluates to ['a', 'b, 23, 55, 70, 22, 'a']
- Membership:
23 in my_list # evaluates to True
Note: When using a string, we were able to check both if a character is in a string (
"c" in "chocolate"
) and if a sequence of characters is in a string ("la" in "chocolate"
). However, since lists can store any type, including other lists, the expression[23, 55] in my_list
will not check if the elements 22 and 55 are inmy_list
, but rather if the list[23, 55]
is an element inmy_list
.
Since lists are mutable, i.e. we can modify them, they also support some extra operations, which are detailed below.
List specific operations
The first way to modify a list is to add an element to it. This can be done either by appending it to the end of the list, or inserting it at a particular index:
>>> lst = []
>>> lst.append(13) # add the element 13 to the end
>>> print(lst)
[13]
>>> lst.append("string") # add the element "string" to the end
>>> print(lst)
[13, 'string']
>>> lst.append([1, 2]) # add the element [1, 2] to the end
>>> print(lst)
[13, 'string', [1, 2]]
>>> lst.append(7) # add the element 7 to the end
>>> print(lst)
[13, 'string', [1, 2], 7]
>>> print(len(lst))
4
>>> lst.insert(2, 66) # insert the element 66 at index 2
>>> print(lst)
[13, 'string', 66, [1, 2], 7]
The second list-specific operation is changing an element at a particular index:
>>> lst = ['a', 'b', 'c', 'd', 'e']
>>> print(lst)
['a', 'b', 'c', 'd', 'e']
>>> lst[2] = 50 # change the element at index 2 to the value 50
>>> print(lst)
['a', 'b', 50, 'd', 'e']
The third thing we are allowed to do with a list is to extend it by appending to it all the elements from another list:
>>> print(lst)
['a', 'b', 50, 'd', 'e']
>>> lst.extend([20, 21])
>>> print(lst)
['a', 'b', 50, 'd', 'e', 20, 21]
We are also allowed to remove elements from a list in a few different ways:
-
lst.remove(x)
removes the first occurrence ofx
fromlst
. Will cause aValueError
error ifx
is not inlst
>>> print(lst) ['a', 'b', 50, 'd', 'e', 20, 21] >>> lst.remove(50) >>> print(lst) ['a', 'b', 'd', 'e', 20, 21] >>> lst.remove(50) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: list.remove(x): x not in list
-
lst.pop(i)
removes and returns the element at positioni
. If no index is specified, it will default to removing and returning the last element>>> lst = ["a", "b", "c"] >>> removed_item = lst.pop(1) >>> print(lst) ['a', 'c'] >>> print(removed_item) b >>> removed_item = lst.pop() >>> print(lst) ['a'] >>> print(removed_item) c
-
lst.clear()
removes all elements from the list>>> lst = ["h", "a", "p", "p", 1] >>> lst.clear() # lst is now an empty list >>> print(lst) []
Recap
Lists are one of the most versatile data types in Python, as they can be used to represent many real-world concepts, are mutable, can store any data type within them, and are easy to work with. However, more specialized data types are still worth knowing and understanding, as they solve particular problems much better compared to lists.
Tuples
As we just saw, lists are great and all, but not every job should be solved with a multi-tool. This is where we introduce our last sequence type, the tuple
. They are used to represent immutable ordered pairings of elements. Think student names, emails, their birthdays, and any other information we might want to to hold about them in our program.
Creating a tuple
In the same way that the simplest list is the empty list, wrote as []
, the simplest tuple is the empty tuple, wrote as ()
. If we want a tuple with more elements, we similarly separate them using commas:
>>> my_tpl = (1100, "sharry", "Aug 29") # tuple with 3 elements
>>> print(my_tpl)
(1100, 'sharry', 'Aug 29')
>>> print(len(my_tpl))
3
>>> my_tpl = ("20",) # tuple with 1 element
>>> print(my_tpl)
('20',)
>>> print(len(my_tpl))
1
Notice that if we want a tuple with a single element, we have to add a comma after it. Without the comma, surrounding a value with parentheses does not place it in a tuple, since the parentheses now have just their "arithmetic meaning" of isolating parts of an expression:
>>> not_a_tpl = ("element")
>>> type(not_a_tpl)
<class 'str'>
>>> print(not_a_tpl)
element
>>> my_tpl = ("element",)
>>> type(my_tpl)
<class 'tuple'>
>>> print(my_tpl)
('element',)
Working with a tuple
As with other sequences, we can:
- Get the length of a tuple using
len
- Index into tuples:
("a", 20, 15, "b")[2] # evaluates to 15
- Concatenate two tuples:
(20, "a") + ("b", 30) # evaluates to the new tuple (20, 'a', 'b', 30)
- Check for membership in a tuple:
"a" in ("bac", 0.5) # evaluates to False
- Slice a tuple:
(10, 20, 30, 40)[1:3] # evaluates to (20, 30)
The main advantage of tuples is their immutability - they're basically unchangeable lists - which makes them perfect for representing collections of related data. If, for instance, you had to design a program that computes final grades for CIS 1100 students, you'd probably represent each student using a tuple, and then make a list containing all the students. Now there's no way to mix things up, or change grades by mistake!
Summary table
There's a lot to learn about sequences, especially since they're one of the first tools we use to represent real-world things in code. However, the essentials are succinct enough to fit in the table below:
Type | Element type | Mutable | Example |
---|---|---|---|
str | characters (strings of len 1) | False | "Hello" 'Cis 1100' |
list | anything | True | [] [1, 2, "a"] |
tuple | anything | False | () (1,) , ("Harry", True) |
range | int | False | range(3) - outputs 0, 1, 2range(1, 5, 2) - outputs 1, 3 |