Learning objectives
By the end of this section you should be able to
- Demonstrate how to access files within a file system.
- Demonstrate how to process a CSV file.
Opening a file at any location
When only the filename is used as the argument to the open()
function, the file must be in the same folder as the Python file that is executing. Ex: For fileobj = open("file1.txt")
in files.py to execute successfully, the file1.txt file should be in the same folder as files.py.
Often a programmer needs to open files from folders other than the one in which the Python file exists. A path uniquely identifies a folder location on a computer. The path can be used along with the filename to open a file in any folder location. Ex: To open a file named logfile.log located in /users/turtle/desktop
the following can be used:
fileobj = open("/users/turtle/desktop/logfile.log")
Operating System | File location |
|
---|---|---|
Mac |
|
|
Linux |
|
|
Windows |
|
|
Concepts in Practice
Opening files at different locations
For each question, assume that the Python file executing the open()
function is not in the same folder as the out.txt file.
Each question indicates the location of out.txt, the type of computer, and the desired mode for opening the file. Choose which option is best for opening out.txt.
Working with CSV files
In Python, files are read from and written to as Unicode by default. Many common file formats use Unicode such as text files (.txt), Python code files (.py), and other code files (.c,.java).
Comma separated value (CSV, .csv) files are often used for storing tabular data. These files store cells of information as Unicode separated by commas. CSV files can be read using methods learned thus far, as seen in the example below.
Raw text of the file:
Title, Author, Pages\n1984, George Orwell, 268\nJane Eyre, Charlotte Bronte, 532\nWalden, Henry David Thoreau, 156\nMoby Dick, Herman Melville, 538
Example 14.3
Processing a CSV file
"""Processing a CSV file."""
# Open the CSV file for reading
file_obj = open("books.csv")
# Rows are separated by newline \n characters, so readlines() can be used to read in all rows into a string list
csv_rows = file_obj.readlines()
list_csv = []
# Remove \n characters from each row and split by comma and save into a 2D structure
for row in csv_rows:
# Remove \n character
row = row.strip("\n")
# Split using commas
cells = row.split(",")
list_csv.append(cells)
# Print result
print(list_csv)
The code's output is:
[['Title', ' Author', ' Pages'], ['1984', ' George Orwell', ' 268'], ['Jane Eyre', ' Charlotte Bronte', ' 532'], ['Walden', ' Henry David Thoreau', ' 156'], ['Moby Dick', ' Herman Melville', ' 538']]
Concepts in Practice
File types and CSV files
Exploring further
Files such as Word documents (.docx) and PDF documents (.pdf), image formats such as Portable Network Graphics (PNG, .png) and Joint Photographic Experts Group (JPEG, .jpeg or .jpg) as well as many other file types are encoded differently.
Some types of non-Unicode files can be read using specialized libraries that support the reading and writing of different file types.
PyPDF is a popular library that can be used to extract information from PDF files.
BeautifulSoup can be used to extract information from XML and HTML files. XML and HTML files usually contain unicode with structure provided through the use of angled <> bracket tags.
python-docx can be used to read and write DOCX files.
Additionally, csv is a built-in library that can be used to extract information from CSV files.
Try It
Processing a CSV file
The file fe.csv contains scores for a group of students on a final exam. Write a program to display the average score.