CE151 PROGRAMMING LAB 5 - USING LISTS

Learning Objectives

      to gain practice in the use of lists

      to gain practice in the simple use of for loops

Make a new folder called Lab 5 in your Labs folder. All the example programs that you develop during this lab should be kept in this folder.

Manipulating Lists of Numbers

Simple Experimentation

Use New Window from the File menu of the Python shell to obtain an editor window. Paste the following code into the window.

"""
list1.py
simple list manipulation
created by sands 7/11/10
"""

myList = [1,2,3,4,5,6,7,8,9,10]

# print the list using print
print(myList)

# print the items one at a time using a while loop
i = 0
while i<len(myList) :
    print("myList["+str(i)+"] is", myList[i])
    i = i+1

print("The last two items in the list are", myList[8], "and", myList[9])

# examine some slices
#print("myList[:4] is ", myList[:4])
#print("myList[0:5] is ", myList[0:5])
#print("myList[5:8] is ", myList[5:8])
#print("myList[8:] is ", myList[8:])

Save the program as list1.py in your Lab 5 folder.

Run the program.

Observe that we can print the entire list using a single call to the print function or we can iterate through the items of the list outputting them one per line.

The last four lines have been commented out. Before removing the hash signs from the beginning of these lines try to predict the output. Now remove the hash signs and run the program to check that your predictions were correct

The contents of a list can be changed using assignment. Beneath the first call to print add a few statements such as

  myList[7] = 77

and add another call to print to output the modified list. Run the program to check the behaviour.

Items can be added to the end of a list using the append method. Beneath the assignment statements that you have just added, but above the new call to print, add a few more statements of the form

  myList.append(23)

Run the program to check the behaviour. Observe that the while loop still prints the entire contents of the list since the value used in determining whether to continue looping is len(myList). Although I knew that the list in the initial program had length 10, I did not explicitly use 10. Since the length of a list may change as a program is being run you should always use the len function if the length is needed; you should not make assumptions that you know the length.

However, the last items are not now displayed correctly. I should have used negative subscripts that do not depend on the length of the list, i.e. myList[-2] and myList[-1]. Replace the subscripts 8 and 9 with -2 and -1 respectively and run the program to check that it does now generate correct output.

Items can be deleted from the list using a del statement.

Remove the assignment statements that you added, and after the calls to append add the lines

  del myList[2]
  del myList[5]

Run the program. Note that the values that were deleted are 3 and 7; after deleting 3, the value 6, which was originally in location 5, is in location 4 and the value in location 5 is 7. Also observe that the slices now have different values.

Close the editor window.

Using For Loops to Iterate Over Sequences

We have seen that it is possible to access and process all of the elements of a sequence using a while loop, using a variable to keep track of the current position in the sequence and incrementing this for each iteration.

There is a more concise way of processing the elements (as long as we do not need to know their positions as they are being processed). This uses a for loop. In Python a for loop has the general form

for e in s :
    ......

In such a loop the variable e takes in turn each of the values specified by the expression-list s. There are various kinds of expression-list that can be used; the simplest of these is a sequence.

Printing List Contents

Make a copy of your list1.py file; call this new copy list2.py and open it in an IDLE editor window. Observe that four lines of code were used to generate the one-at-a-time output. Delete these four lines and replace them for the following line

  for e in myList : print(e)

Run the program and observe the output.

This printing loop is more concise since we do not have to initialise and increment the variable i.

It is recommended that you complete the first 6 exercises of assignment 1 before attempting the exercises below.

Using Lists as Stacks

Stacks

A stack is a last-in-first-out store. Items may be added to the top and removed from the top. (It is somewhat similar to an in-tray in an office - documents are added to the top of the pile and when a secretary is available he or she will remove and process the top document.)

The operation of adding an item to the top of the stack is usually referred to as push, and the operation of removing the top item is usually referred to as pop.

Stacks are used for many purposes in computer science. In most programming languages it is necessary for the user to implement his own stacks, or use a library package, but in Python we can simply use a list as a stack. The top of the stack is simply the end of the list, so the append method can be used for pushing. It would of course be possible to use myStack[-1] to access the top item and del myStack[-1] to delete it, but the designers of the language decided to provide a pop method that returns and removes the item. The assignment

  i = myStack.pop()

will remove the last item from the list myStack and make the variable i refer to it.

Obtain a new editor window and paste into it the following code

"""
stack1.py
interactive demonstration of stacks of strings
created by sands 7/11/10
"""

myStack = []

running = True

while running :
    print("\nStack contents:", myStack, "\n")  # generate extra blank lines in output
    line = input("Select an option: push, pop or quit: ")
    if line=="quit" :
        running = False
    elif line=="pop" :
        print("Popped", myStack.pop())
    elif line=="push" :
        myString = input("Enter name to push onto stack: ")
        myStack.append(myString)
    else :
        print("Invalid selection")

Save the program as stack1.py in your Lab 5 folder.

This program simply allows us to experiment interactively with a stack by pushing items onto it and popping items from it.

Run it and try various different sequences of pushes and pops.

If you select the pop option when there is nothing on the stack the program will terminate with an error message. We should check whether the list is empty before trying to call the pop method. Modify the program to do this (to check if the list is empty just check whether its length is 0).

Run and test again, then close the editor window.

Checking Nesting of Parentheses, Brackets and Braces

Since in Python lists may contain any data items they may contain items such as sets (e.g. { red, green, blue }, or tuples (e.g.("yes", "no", 7)). Additionally we can have lists and sets as elements of tuples, so it is possible to write expressions with multiple levels of nesting of parentheses, brackets, and braces, e.g. format([(3,[4,5]),{1,2,4},(3+4)*5][2], "3d")

To avoid repetitive typing I shall use the "word" brack to mean parentheses, bracket or brace.

We can check that the bracks in an expression are correctly nested using a stack. Each time we encounter an opening brack we should push it onto the stack.  It will remain there while the expression inside the pair of bracks is checked. During this checking more bracks will probably be pushed onto the stack, but when it is complete they should all have been popped again so when we encounter the closing brack the corresponding opening brack should be at the top of the stack. Hence whenever we encounter a closing brack we need to pop the top item from the stack and check that it matches the closing brack. If it does not match, the nesting is incorrect, and if the stack is empty there was no matching opening brack.

Having processed the entire expression the stack should be empty; it if is not, there are some missing closing bracks.

Here is an outline of a function to check an expression; I have chosen to use a for loop to iterate through the string.

def checkNesting(s) :
    """
    checks if brackets, braces and parentheses in expression s
                           are correctly nested
    argument: s: str
    returns: boolean: True if nesting is correct, False otherwise
    outputs error message if incorrect
    outline written by sands 7/11/10
    """

    myStack = []

    i = 0
    for ch in s :
        if ch in "{[(" :
            # push ch onto stack
        elif ch in "}])" :
            # if stack is empty :
                # output error message - no matching opening brack for s[i]
                return False
            else :
                # pop top item from stack
                # if it doesn't match s[i]
                    # output error message - expected match for top item, got s[i]
                     return False
        i = i+1

    # if stack is not empty :
        # output error message - match for top item on stack missing
        return False
    else :
        return True

Open a new editor window, save the file as nest1.py in your Lab 5 folder, add an appropriate documentation string and beneath this paste a copy of the incomplete function. Complete the function by replacing all the comments by code - the error messages should use the actual values to generate output such as
  Error - no matching [ for ]

Note that matching does not mean that characters should be equal - for example if ch is "]" the top character on the stack should be "[".

Add to the end of your program code that will input a line of text containing an expression, call the checkNesting function with that line as an argument and print a message stating that the expression is valid if the function returns the value True. (If the function returns the value False it will have already printed an error message so there is no need to output anything else.)

Run and test thoroughly using examples such as the following

{((3+4)*a[5*(b-c)])/5, 99}   (valid)
((3+4)*a[5*(b-c)]   (invalid - missing closing parenthesis)
((3+4)*a[5*(b-c))]   (invalid - incorrect nesting; should report no matching opening parenthesis for last closing parenthesis)
{((3+4)*a[5*(b-c)])/5, 99}]  (invalid - extra closing brace with no matching opening brace)

A Challenging Exercise

Validating XHML Tag Nesting

In HTML it is legitimate to write mark-up with incorrectly-nested tags, e.g. <b>Hello <i> CE151 </b> students </i>. This however, is not allowed in XHTML. The aim of this exercise is to write a program that will read an HTML document and check whether its tags are nested correctly according to XHTML standards.

The first step is to open the file - to do this you need to perform an assignment

  f = open(fileName)

where fileName is a variable containing the name of the file. (The program will terminate with an error if the file cannot be opened - to prevent this we would need to know how to handle exceptions, so we will just accept it for now.)

You can then read one line at a time from the file using a loop containing a statement of the form

  line = f.readline()

The string returned will include the newline character at the end of the line so for a blank line it has length 1; hence if a string of length 0 is returned we know that the end of the file has been reached and the input loop should be terminated.

We wish to extract any tag from the line and add them to a list. To do this search for the first occurrences of < and > in the line and obtain the relevant slice. There may be more tags on the line so you will need to perform an assignment of the form line = line[n:] (where n is the location of the next character after the >) and perform the tag extraction in a loop.

This will not find tags that are spread over more than one line - it's probably best not to worry about this at the moment, but fix it later.

The list containing all the tags should be passed as an argument to a checkNesting function similar to that seen earlier. Within this function, for each tag you will need to check whether the second character is / to determine whether it's an opening or closing tag. If it's an opening tag you should also check whether the penultimate character is / (e.g. <br/>) in which case the tag is self-closing and can be ignored. The tag name (e.g. p, a, br, table) should then be extracted (it starts after the < or </ and finishes before the first space, or before the > if there is no space. If the tag is an opening tag its name should be pushed onto the stack, if it's a closing tag the top stack item should be popped - it should match the tag name.

Here are four simple HTML files for testing the program:
page1.html is valid
page2.html contains incorrectly nested <b> and <i> tags as in the example seen earlier
page3.html has a missing closing tag
page4.html has a closing tag with no corresponding opening tag

Having tested that your program gives the correct results try it with some "real" HTML files, such as the CE151 home page.

Handling Multiple-Line Tags

The next step is to modify the program so that it can handle tags that occupy more than one line. If an input line contains a < but no > you should save the line and concatenate it with the next line. (You may have to do this more than once if the tag extends over several lines.)

There will still be problems if the tag name is the last item on the first line, the line to be concatenated should be the first line without the newline character, but with a space appended (e.g. savedLine[:-1]+' ' ).

The program should now detect a deliberate error on the CE151 home page! (There's an img tag that extends over 2 lines and doesn't have a matching closing tag.)

Better Error-Reporting

Since an HTML file could be rather long it would be useful for the program to inform the user of the line number(s) at which the error occurred. To be able to do this you'll need to store a line number along with each tag in the tag list and each tag-name in the stack. Hence the items in the lists need to be pairs of values. In Python we can uses tuples for this purpose - a tuple is used to gather together two or more data items and allow them to be treated as a single item. You will need to use tuples such as (17,"<a href ='xxx'>") and (17,"a"). To create a tuple you simply use something like myTuple = (lineNum, tagString). You can retrieve the values from a tuple either using subscripting (e.g. lineNum = myTuple[0]) or a multiple assignment (e.g. lineNum, tagString = myTuple - note that no parentheses are used here).