Strings in Python
By Angela C
March 1, 2021
Reading time: 4 minutes.
Strings represent text of any kind. In Python strings must be wrapped in quotes, either single or double quotes. Triple quotes can be used to create multiline strings.
Strings are immutable.
Strings are sequences and indexed as such.
There are many string methods including .upper()
to uppercase, .lower()
to lowercase, .count()
to count the number of occurences of an element
text1 = "This is a string"
text2 = " and this is another string"
text1 + text2
'This is a string and this is another string'
text3 = """This is a multiline string
Therefore you can write multiline text.
The string must be wrapped in triple quotes.
"""
text3
'This is a multiline string\nTherefore you can write multiline text.\nThe string must be wrapped in triple quotes.\n'
String methods
"hello World".count('o')
2
"Hello World".upper()
'HELLO WORLD'
"hello World".lower()
'hello world'
"hello World".title()
'Hello World'
"Hello world".replace("world", "universe")
'Hello universe'
.strip()
to remove leading and trailing whitespace.
mystring = " hello there and a very big space"
mystring.strip()
‘hello there and a very big space’
"Using the find method to find the index of the first occurence for the matching string".find("method")
15
"Using the rfind method to find the index of the first occurence for the matching string from the end".rfind("string")
81
"to be or not to be".find("be")
3
"to be or not to be".rfind("be")
16
"to be or not to be".startswith('t')
True
String formatting
"hello".rjust(10, ' ')
' hello'
"hello".ljust(10, ' ')
'hello '
"123".zfill(10)
'0000000123'
"Hello {} child and your {} family".format("dear", "wonderful")
'Hello dear child and your wonderful family'
'{0} {1} {3} {2}'.format("welcome", "to", "home","my").title()
'Welcome To My Home'
"Welcome to our {adj} shop".format(adj="new")
'Welcome to our new shop'
F-strings
Everything inside the curly brackets is executable code. Arithmetic and functions could be placed inside the curly brackets.
adj1, adj2 = "new", "exciting"
f"Welcome to my {adj1} and {adj2} shop"
'Welcome to my new and exciting shop'
hours = 24
days =7
f"We are open for {hours * days} hours every week"
'We are open for 168 hours every week'
shop = "The new sweet shop"
adj= "fantastic"
f"My {adj } new shop is called {shop.title()}."
'My fantastic new shop is called The New Sweet Shop.'
Formatting mini-language
String objects have a format method for substituting formatted arguments into a string producing a new string.
Add a colon :
after the expression in curly brackets, followed by the mini-language notation.
For example
{0:.2f}
format the first argument as a floating point number with two decimal places{1:s}
format the second argument as a string{2:d}
to format the 3rd argument as an exact integer.
pct = .20
first_customers = 500
value = 100
f"There will be a discount of {pct:.1%} for the first {first_customers} customers every Monday with everything less than €{value:.2f}"
'There will be a discount of 20.0% for the first 500 customers every Monday with everything less than €100.00'
Strings are sequences
"Python treats strings as sequences."[0]
'P'
"Python treats strings as sequences."[0:10]
'Python tre'
"Python treats strings as sequences."[:10]
'Python tre'
"Python treats strings as sequences."[10:]
'ats strings as sequences.'
"string" in "You can use the 'in' keyword to check if one string contains another string"
True
Strings can be concatenated using `+`
```python
text1 + text2
'This is a string and this is another string'
Some useful string methods
While most objects in Python are mutable, strings (and tuples) are not and therefore you cannot modify them.
str.partition(sep)
Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.
This is useful when searching through a directory for files whose file names contain a particular pattern.
Also when splitting a dataframe column into new column(s) based on the part before the separator, the separator itself and the part after the separator.
For example the Month column below consists of a year and month separated by a ‘M’.
Month | VALUE |
---|---|
0 | 1958M01 |
1 | 1958M01 |
2 | 1958M01 |
Using str.partition
to split the Month into a year and month column.
df[['year', 'month']] = df['Month'].str.partition('M')[[0,2]]
This creates 2 new columns
Month | VALUE | year | month |
---|---|---|---|
0 | 1958M01 | 160.2 | 1958 |
1 | 1958M01 | 95.6 | 1958 |
This achieves the same result as using str.split
.
Set expand = True
to create multiple columns.
df[['year', 'month']] = df['Month'].str.split('M', n=1, expand=True)
Use pop
to remove the original column if it is not needed once the new columns are created.
df[['year', 'month']] = df.pop('Month').str.split('M', n=1, expand=True)
Pandas String partitioning using str.partition
.
pandas str.partition
method splits a string into three parts using the given separator. This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
This is especially useful when splitting a URL into parts.
For example I have a dataframe containing URLs to datasets from the Central Statistics Pffice (CSO) PxStat database.
Each URL follows the same format:
“https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/DHA09/CSV/1.0/en"
df['url'].str.partition('/CSV/')
will split the string into 3 parts, the part before ‘/CSV/’ in position 0, ‘/CSV/’ in position 1 and the part after ‘/CSV/’ in position 2.
“https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/DHA09", “/CSV/” and “1.0/en”
Each of the three parts can be retrieving using indexing (from 0 to 2)
To further split the first part of the URL, call str.partition
again.
df['url'].str.partition('/CSV/')[0]\ .str.partition('https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/')