WHAT WE ARE GOING TO LEARN, LET'S KEEP AN EYE :
You all have basic knowledge about python strings. You know that python strings are characters enclosed in quotes of any type - single quotations marks, double quotation marks and triple quotation marks. You have also learnt things like - an empty string is a string that has 0 characters(i.e., it is just a pair of quotation marks) and that python strings are immutable. You have used strings in earlier chapters to store text type of data.
You know by now that strings are sequence of characters, where each character has a unique position-id/index. The indexes of a string begin from 0 to (length-1) in forward direction and -1, -2, -3, ...., - length in backward direction.
L2 TRAVERSING A STRING :
You know that individual characters of a string are accessible through the unique index of each character. Using the indexes, you can traverse a string character by character. Traversing refers to iterating through the elements of a string, one character at a time. You have already traversed through strings, through unknowingly, when we talked about sequences along with for loops to traverse through a string, you can write a loop like -
>>> name = "superb"
>>> for ch in name :
print(ch, '-', end = ' ')
The above code will print :
s-u-p-e-r-b-
The information that you have learnt till now is sufficient to create wonderful programs to manipulate strings. Consider the following programs that use the python string indexing to display strings in multiple ways.
L3 STRING OPERATORS :
In this section, you'll be learning to work various operators that can be used to manipulate strings in multiple ways. We'll be talking about basic operators + and *, membership operators in and not in and comparison operators(all relational operators) for strings.
A) BASIC OPERATORS -
The two basic operators of strings are : + and *. you have used these operators as arithmetic operators before for addition and multiplication respectively. But when used with strings, + operator performs concatenation rather than addition and * operator performs replication rather than multiplication. Let us see, how.
Also, before we proceed, recall that strings are immutable i.e., un-modifiable. Thus every time you perform something on a string that changes it, python will internally create a new string rather than modifying the old string in place.
- String Concatenation Operator + -
The + operator creates a new string by joining the two operand strings, e.g.,
"tea" + "pot"
Will result into
"teapot"
For example -
Expression Will result into
'1' + '1' '11'
"a" + "0" 'a0'
'123' + 'abc' '123abc'
Let us see how concatenation takes place internally. Python creates a new string in the memory by storing the individual characters of first string operand followed by the individual characters of second string operand. Original strings are not modified as strings are immutable; new strings can be created but existing cannot be modified.
Caution :
Another important thing that you need to know about + operator is that this operator can work with numbers and strings separately for addition and concatenation respectively, but in the same expression, you cannot combine numbers and strings as operands with a + operator. For example -
2+3 = 5 (addition)
'2'+'3' = 23 (concatenation)
But the expression
'2' + 3 = Traceback Error (invalid)
- String Replication Operator * -
The * operator when used with numbers(i.e., when both operands are numbers), it performs multiplication and returns the product of the two number operands.
To use a * operator with strings, you need two types of operands - a string and a number, i.e., as number*string or string*number.
Where string operand tells the string to be replicated and number operands tells the number of times, it is to be repeated; python will create a new string that is a number of repetitions of the string operand. For example,
5*"hello/"
It will return : "hello/hello/hello/hello/hello/"
Caution :
Another important thing that you need to know about * operator is that this operator can work with numbers as both operands for multiplication and with a string and a number for replication respectively, but in the same expression, you cannot have strings as both the operands with a * operator. For example -
2*3=6 #valid multiplication
"2"*3="222" #valid replication
but, "2"*"3" #invalid
traceback(most recent call last)
file "<pyshell#0>", line1, in <module>
"2"*"3"
TypeError : can't multiply sequence by non-int of type 'str'
=> working of python * operator :
operand datatype operation of * example
numbers multiplication 9 * 9 = 18
string, number replication "#" * 3 = "###"
number, string replication 3 * "#" = "###"
B) MEMBERSHIP OPERATORS :
There are two membership operators for strings(in fact for all sequence types). These are in and not in. We have talked about them in previous chapter, briefly. Let us learn about these operators in context of strings. Recall that :
in : Returns True if a character or a substring exists in the given string; False otherwise
not in : Returns True if a character or a substring does not exists in the given string; False otherwise
Both membership operators(when used with strings), require that both operands used with them are of string type, i.e.,
<string> in <string> ("12" in "xyz")
<string> not in <string> ("12" not in "xyz")
For example :
"a" in "heya" will give True
"jap" in "heya" will give False
"jap" in "japan" will give True
"jap" in "Japan" will give False
"jap" not in "Japan" will give True
"123" not in "xyz" will give True
The in and not in operators can also work with string variables. Consider this :
>>> sub = "help"
>>> string = "helping hand"
>>> sub2 = 'HELP'
>>> sub in string
True
>>> sub2 in string
False
>>> sub not in string
False
>>> sub2 not in string
True
C) COMPARISION OPERATORS :
Python's standard comparison operators i.e., all relational operators(<, <=, >, >=, ==, !=) apply to strings also. The comparisons using these operators are based on the standard character-by-character comparison rules for Unicode(i.e., dictionary order). Thus, you can make out that
"a" == "a" will give True
"abc" == "abc" will give True
"a" ! = "abc" will give True
"A" ! = "a" will give True
"ABC" == "abc" will give False
"abc" ! = "Abc" will give True
Equality and non-equality in strings are easier to determine because it goes for exact character matching for individual letters including the case(upper-case or lower-case) of the letter. But for other comparisons like less then(<) or greater than(>), you should know the following piece of useful information.
As internally python compares using Unicode values(called ordinal value), let us know about some most common characters and their ordinal values. For most common characters, the ASCII values and Unicode values are the same. The most common characters and their ordinal values are -
Characters Ordinal Values
'0' to '9' 48 to 57
'A' to 'Z' 65 to 90
'a' to 'z' 97 to 122
'a' < 'A' will give False (because the Unicode value of lowercase letters is higher than uppercase letters; hence 'a' is greater than 'A', not lesser)
'ABC' > 'AB' will give True
'abc' <= 'ABCD' will give False (because letters of 'abc' have higher ASCII values compared to 'ABCD'.)
'abcd' > 'abcD' will give True (because strings 'abcd' and 'abcD' are same till first three letters but the last letter of 'abcD' has lower ASCII)
Thus, you can say that python compares two strings through relational operators using character-by-character comparison of their Unicode values.
- Determining Ordinal/Unicode Value of a Single Character :
Python offers a built-in function ord() that takes a single character and returns the corresponding ordinal Unicode value. It is used as per following format -
ord(<single-character>)
Let us see how, with the help of some examples -
To know the ordinal value of letter 'A', you'll write ord('A') and python will return the corresponding ordinal values -
>>> ord('A')
65
But you need to keep in mind that ord() function requires single character string only. You may even write an escape sequence enclosed in quotes for ord() function.
The opposite of ord() function in chr(), i.e., while ord() returns the ordinal value of a character, the chr() takes the ordinal value in integer form and returns the character corresponding to that ordinal value. The general syntax of chr() function is -
chr(<int>) (the ordinal value is given in integer)
Have a look at some examples -
>>> chr(65)
'A'
>>> chr(97)
'a'
L4 STRING SLICES :
As an English term, you know the meaning of word 'slice' which means - 'a part of'. In the same way, in Python, the term 'string slice' refers to a part of the string, where strings are sliced using a range of indices.
That is, for a string say name, if we give name[n : m] where n and m are integers and legal indices, Python will return a slice indices n and m - starting at n, n+1, n+2 ... till m-1. Let us understand this with the help of examples. Say we have a string namely word storing 'amazing' i.e.,
0 1 2 3 4 5 6
Word - A M A Z I N G
-1 -2 -3 -4 -5 -6 -7
Then,
word[0 : 7] will give 'amazing'
word[0 : 3] will give 'ama'
word[2 :5 ] will give 'azi'
word[-7 : -3] will give 'amaz'
word[-5 : -1] will give 'azin'
From the above example, one thing must be clear to -
In a string slice, the character at last index (the one following colon(:)) is not included in the result.
In a string slice, you give the slicing range in the form [<begin-index>:<last>]. If, however, you skip either of the begin-index or last, python will consider the limits of the string i.e., for missing begin-index, it will consider 0 (the first index) and for missing last value, it will consider length of the string. For example -
word[: 7] will give 'amazing'
word[: 5] will give 'amazi'
word[3 :] will give 'zing'
word[5 :] will give 'ng'
The string slice refers to a part of the string s[start:end] that is the elements beginning at start and extending up to but not including end.
- Interesting Interface :
Using the same string slicing technique, you will find that
For any index n, s[:n] + s[n:] will give you original string s.
This works even for n negative or out of bounds. Let us prove this with an example. Consider the same string namely word storing 'amazing'.
>>> word[3:], word[:3]
'zing' 'ama'
>>> word[:3] + word[3:]
'amazing'
>>> word[:-7], word[-7:]
'' 'amazing'
>>> word[:-7] + word[-7:]
'amazing'
You can give a third (optional) index (say n) in string slice too. With that every nth element will be taken as part of slice e.g., for ward = 'amazing', look at following examples.
>>> word[1:6:2]
'mzn'
>>> word[-7:-3:3]
'az'
>>> word[: :-2]
'giaa'
>>> word[: :-1]
'gnizama'
- Another Interesting Inference Is :
=> Index out of bounds cause error with strings but slicing a string outside the bounds does not cause error.
s = 'Hello'
print(s[5])
Error
But if you give
s = "Hello"
print(s[4 : 8])
print(s[5 : 10])
0
i.e., letter '0' followed by empty string in next line. The reason behind this is that when you use an index, you are accessing a constituent character of the string, thus the index must be valid and out of bounds index causes error as there is no character to return from the given index. But slicing always returns a subsequence and empty sequence is a valid sequence, thus when you slice a string outside the bounds, it still can return empty subsequence and hence python gives no error and returns empty subsequence.
L5 STRING FUNCTIONS AND METHODS :
Python also offers many built-in functions and methods for string manipulation. You have already worked with one such method len() in earlier chapters. In this chapter, you will learn about many other built-in powerful string methods of python used for string manipulation. Every string object that you create in python is actually an instance of string class (you need not do anything specific for this; python does it for you - you know built-in). The string manipulation method that are being discussed below can be applied to string as per following syntax :
<stringobject>.<method name> ()
In the following table we are referring to <stringobject> as string only (no angle brackets) but the meaning is intact i.e., you have to replace string with a legal strong (i.e., either a string literal or a string variable that holds a string value).
A) Python's Built-In String Manipulation Functions And Methods :
1) The len() Function :
It returns the length of its argument string, i.e., it returns the count of characters in the passed string. The len() functions is a python standard library function. It is used as per the following syntax -
len(<string>)
>>> len('hello')
6
>>> name = 'maria'
>>> len(name)
5
2) The capitalize() Method :
It returns a copy of the string with its first character capitalized. It is used as per the syntax :
<string>.capitalize( )
>>> 'true'.capitalize()
True
>>> 'i love my india'.capitalize()
I love my India
3) The Count() Method :
It returns the number of occurrences of the substring sub in string(or string [start:end] if these arguments are given). It is used as per the syntax :
<string>.count(sub[, start[, end]])
To count the occurrence of 'ab' in - 'abracadabra' in the whole string, 4-8 characters and 6th character onwards.
>>> 'abracadabra'.count('ab')
2
>>> 'abracadabra'.count('ab', 4, 8)
0
>>> 'abracadabra'.count('ab', 6)
4) The find() Method :
It returns the lowest index in the string where the substring sub is found within the slice range of start and end. Returns -1 if sub is not found. It is used as per the syntax -
<string>.find(sub[, start[, end]])
>>> string = 'it goes as - ringa ringa roses'
>>> sub = 'ringa'
>>> string.find(sub)
13
>>> string.find(sub, 15, 22)
-1
>>> string.find(sub, 15, 25)
19
5) The index() Method :
It returns the lowest index where the specified substring is found. If the substring is not found then an exception, ValueError, is raised. It works like find(), but find() returns -1 if the sub is not found, BUT index() raises an exception, if sub is not found in the string. It is used as per the syntax -
<string>.index(sub[, start[, end]])
To find the index of the 1st occurrence of 'ab' in 'abracadabra' - in the whole string, 4-8 characters - and 6th character onward.
>>> 'abracadabra'.index('ab')
0
>>>'abracadabra'.index('ab', 6)
7
>>> 'abracadabra'.index('ab', 4, 8)
Traceback (most recent call last) :
File"<pyshell#7>", line1, in <module>
'abracadabra'.index('ab', 4, 8)
ValueError : substring not found
6) The isalnum() Method :
It returns True if the characters in the string are alphanumeric (alphabets or numbers) and there is at least one character, False otherwise. Please note that the space(' ') is not treated as alphanumeric. It is used as per the syntax -
<string>.isalnum()
>>> string = ''abc123''
>>> string2 = 'hello'
>>> string3 = '12345'
>>> string4 = ' '
>>> string.isalnum()
True
>>> string2.isalnum()
True
>>> string3.isalnum()
True
>>> string4.isalnum()
False
7) The isalpha() Method :
It returns True if all characters in the string are alphabetic and there is at least one character, False otherwise. It is used as per the syntax -
<string>.isalpha()
(considering the same string values as used in example of previous function - isalnum)
>>> string.isalpha()
False
>>> string2.isalpha()
True
>>> string3.isalpha()
False
>>> string4.isalpha()
False
8) The isdigit() Method :
It returns True if all the characters in the string are digits. There must be at least one character, otherwise it returns False. It is used as per the syntax -
<string>.isdigit()
(considering the same string values as used in the example of previous function - isalnum)
>>> string.isdigit()
False
>>> string2.isdigit()
False
>>> string3.isdigit()
True
>>> string4.isdigit()
False
9) The islower() Method :
It returns True if all cased characters in the string are lowercase. There must be at least one cased character. It returns False otherwise. It is used as per the syntax -
<string>.islower()
>>> string = 'hello'
>>> string2 = 'THERE'
>>> string3 = 'Goldy'
>>> string.islower()
True
>>> string2.islower()
False
>>> string3.islower()
False
10) The isspace() Method :
It returns True if there are only whitespace characters in the string. There must be at least one character. It returns False otherwise. It is used as per the syntax -
<string>.isspace()
>>> string = " "
>>> string2 = ""
>>> string.isspace()
True
>>> string2.isspace()
False
11) The isupper() Method :
It tests whether all cased characters in the string are uppercase and requires that there be at least one cased character. Returns True if so, False otherwise. It is used as per the syntax -
<string>.isupper()
>>>string = "HELLO"
>>> string2 = "There"
>>> string3 = "goldy"
>>> string4 = "U123"
>>> string5 = "123f"
>>> string.isupper()
True
>>> string2.isupper()
False
>>> string3.isupper()
False
>>> string4.isupper()
True
>>> string5.isupper()
False
12) The lower() Method :
It returns a copy of the string converted to lowercase. It is used as per the syntax -
<string>.lower()
(considering the same string values as used in the example of previous function - isupper)
>>> string.lower()
'hello'
>>> string2.lower()
'there'
>>> string3.lower()
'goldy'
>>> string4.lower()
'u123'
>>> string5.lower()
'123f'
13) The upper() Method :
It returns a copy of the string converted to uppercase. It is used as per the syntax -
<string>.upper()
(considering the same string values as used in the example of previous function - isupper)
>>> string.upper()
'HELLO'
>>> string2.upper()
THERE
>>> string3.upper()
'GOLDY'
>>> string4.upper()
U123
>>> string5.upper()
'123F'
14) The Istrip(), rstrip(), strip() Methods :
Istrip( ) - Returns a copy of the string with leading whitespaces removed, i.e., whitespaces from the leftmost end are removed. It is used as per the syntax -
<string>.Istrip()
rstrip( ) - Returns a copy of the string with trailing whitespaces removed, i.e., whitespaces form the rightmost end are removed. It is used as per the syntax -
<string>.rstrip()
strip( ) - Returns a copy of the string with leading and trailing whitespaces removed, i.e., whitespaces from the leftmost and the rightmost ends are removed. It is used as per the syntax -
<string>.strip()
>>> " Sipo ".lstrip()
'Sipo '
>>> " Sipo ".rstrip()
' Sipo'
>>> " Sipo ".strip()
'Sipo'
15) The startswith(), endswith() Method :
startswith( ) - Returns True if the string starts with the substring sub, otherwise returns False. It is used as per the syntax -
<string>.startswith()
endswith( ) - Returns True if the string ends with the substring sub, otherwise returns False. It is used as per the syntax -
<string>.endswith()
>>> "abcd".startswith("cd")
False
>>> "abcd".startswith("ab")
True
>>> "abcd".endswith("b")
False
>>> "abcd".endswith("cd")
True
16) The title() Method :
It returns a title-cased version of the string where all words start with uppercase characters and all remaining letters are in lowercase. It is used as per the syntax -
<string>.title()
>>> "the sipo app".title()
'The Sipo App'
>>> "COMPUTER SCIENCE".title()
'Computer Science'
17) The istitle() Method :
It returns True if the string has the title case, (i.e., the first letter of each word in uppercase, while rest of the letter in lowercase), False otherwise. It is used as per the syntax -
<string>.istitle()
>>> 'Computer Science'.istitle()
True
>>> "COMPUTER SCIENCE".istitle()
False
18) The replace() Method :
It returns a copy of the string with all occurrences of substring old replaced by new string, it is used as per the syntax -
<string>.replace(old, new)
>>> 'abracadabra'.replace('ab', 'sp')
'spracadabra'
>>> 'I work for you'.replace('work', 'care')
'I care for you'
>>>'you and i work for you'.replace('you', 'U')
'U and i work for U'
19) The join() Method :
It joins a string or character(i.e., <str>) after each member of the string iterator i.e., a string based sequence. It is used as per the syntax -
<string>.join(<string iterable>)
(i) If the string based iterator is a string then the (i.e., <str>) is inserted after every character of the string, e.g.,
>>> "*".join("Hello")
'H*e*l*l*o'
>>> "***".join("TRIAL")
'T***R***I***A***L'
(ii) If the string based iterator is a list or tuple of strings then, the given string/character is joined with each member of the list or tuple, BUT the tuple or list must have all member as strings otherwise python will raise an error.
>>> "$$".join(["trial", "hello"])
'trial$$hello'
>>> "###".join(("trial", "hello", "new"))
'trial###hello###new'
>>> "###".join((123, "hello". "new"))
Traceback (most recent call last) :
"###".join((123, "hello", "new"))
TypeError : sequence item 0 : expected str instance, int found
20) The split() Method :
It splits a string(i.e., <str>) based on given string or character(i.e., <string/char>) and returns a list containing split strings as members, It is used as per the syntax -
<string>.split(<string/char>)
(i) If you do not provide any argument to split then by default it will split the given string considering whitespace as a separator, e.g.,
>>> "I Love Python".split()
['I', 'Love', 'Python']
>>> "I Love Python".split(" ")
['I', 'Love', 'Python']
(ii) If you provide a string or a character as an argument to split(), then the given string is divided into parts considering the given string/character as separator and separator character is not included in the split strings e.g.,
>>> "I Love Python".split("o")
['I L', 've Pyth', 'n']
21) The partition() Method :
The partition() method splits the string at the first occurrence of seperator, and returns a tuple containing three items.
=> The part before the separator
=> The separator itself
=> The part after the separator
It is used as per the syntax -
<string>.partition(<separator/string>)
>>> txt = 'I enjoy working in python'
>>> x = txt.partition("working")
>>> print(x)
('I enjoy', 'working', 'in python')
- Difference Between Partition() Vs Split() :
The split()'s Functioning
=> split() will split the string at any occurrence of the given argument.
=> It will return a list type containing the split substring.
=> The length of the list is equal to the number of words, if split on whitespaces.
The partition()'s Functioning
=> partitions() will only split the string at the first occurrence of the given argument.
=> It will return a tuple type containing the split substring
=> It will always return a tuple of length 3, with the given separator as the middle value of the tuple.