Telling Python to Read a String Within a Cell
Reading and Writing Text Files
Overview
Teaching: 60 min
Exercises: thirty minQuestions
How tin I read in data that is stored in a file or write information out to a file?
Objectives
Be able to open up a file and read in the data stored in that file
Understand the divergence between the file name, the opened file object, and the data read in from the file
Be able to write output to a text file with simple formatting
Why practice we desire to read and write files?
Being able to open and read in files allows us to piece of work with larger data sets, where it wouldn't be possible to type in each and every value and shop them 1-at-a-time as variables. Writing files allows us to process our data and and so save the output to a file so we tin expect at it later.
Correct now, we will practice working with a comma-delimited text file (.csv) that contains several columns of data. However, what y'all acquire in this lesson can be practical to whatever general text file. In the next lesson, you will learn another way to read and process .csv information.
Paths to files
In club to open up a file, we need to tell Python exactly where the file is located, relative to where Python is currently working (the working directory). In Spyder, we tin can do this by setting our current working directory to the binder where the file is located. Or, when we provide the file proper noun, we can give a complete path to the file.
Lesson Setup
We will work with the practice file Plates_output_simple.csv.
- Locate the file Plates_output_simple.csv in the directory dwelling/Desktop/workshops/bash-git-python.
- Copy the file to your working directory, dwelling house/Desktop/workshops/YourName.
- Make certain that your working directory is also set to the folder habitation/Desktop/workshops/YourName.
- Every bit you are working, make certain that you save your file opening script(s) to this directory.
The File Setup
Let's open up and examine the construction of the file Plates_output_simple.csv. If y'all open up the file in a text editor, you will see that the file contains several lines of text.
Nonetheless, this is fairly difficult to read. If you open up the file in a spreadsheet programme such as LibreOfficeCalc or Excel, you lot can run into that the file is organized into columns, with each column separated by the commas in the epitome above (hence the file extension .csv, which stands for comma-separated values).
The file contains ane header row, followed past eight rows of data. Each row represents a single plate image. If we look at the column headings, nosotros tin see that we accept collected data for each plate:
- The name of the image from which the data was collected
- The plate number (there were 4 plates, with each plate imaged at two different time points)
- The growth condition (either control or experimental)
- The ascertainment timepoint (either 24 or 48 hours)
- Colony count for the plate
- The average colony size for the plate
- The pct of the plate covered by bacterial colonies
We will read in this data file and and so work to analyze the data.
Opening and reading files is a iii-stride process
Nosotros will open and read the file in three steps.
- We volition create a variable to hold the name of the file that we want to open.
- We will call a open up to open the file.
- We volition call a function to really read the data in the file and store it in a variable and then that we tin can process it.
And then, there'southward i more step to do!
- When we are washed, nosotros should remember to shut the file!
You can call up of these three steps as being similar to checking out a book from the library. Starting time, you lot have to go to the catalog or database to notice out which book you need (the filename). So, you have to become and get it off the shelf and open up the volume up (the open function). Finally, to gain any information from the volume, you accept to read the words (the read office)!
Here is an instance of opening, reading, and closing a file.
#Create a variable for the file proper noun filename = 'Plates_output_simple.csv' #This is but a cord of text #Open the file infile = open ( filename , 'r' ) # 'r' says nosotros are opening the file to read, infile is the opened file object that we volition read from #Store the information from the file in a variable data = infile . read () #Print the data in the file print ( information ) #close the file infile . close ()
One time we accept read the data in the file into our variable data, we can treat information technology like any other variable in our code.
Utilize consequent names to make your code clearer
It is a skilful idea to develop some consequent habits about the way yous open and read files. Using the same (or similar!) variable names each time will make it easier for you to continue track of which variable is the name of the file, which variable is the opened file object, and which variable contains the read-in information.
In these examples, nosotros will use
filename
for the text string containing the file name,infile
for the open file object from which we tin read in data, anddata
for the variable holding the contents of the file.
Commands for reading in files
In that location are a variety of commands that let us to read in data from files.
infile.read()
will read in the entire file as a single cord of text.
infile.readline()
will read in one line at a time (each time you call this command, information technology reads in the next line).
infile.readlines()
will read all of the lines into a list, where each line of the file is an detail in the list.
Mixing these commands can have some unexpected results.
#Create a variable for the file proper noun filename = 'Plates_output_simple.csv' #Open the file infile = open ( filename , 'r' ) #Print the first two lines of the file print ( infile . readline ()) print ( infile . readline ()) #call infile.read() print ( infile . read ()) #close the file infile . close ()
Notice that the infile.read()
command started at the third line of the file, where the first ii infile.readline()
commands left off.
Think of information technology like this: when the file is opened, a arrow is placed at the top left corner of the file at the beginning of the first line. Whatsoever fourth dimension a read function is called, the cursor or pointer advances from where it already is. The first infile.readline()
started at the showtime of the file and advanced to the end of the get-go line. Now, the arrow is positioned at the beginning of the 2d line. The second infile.readline()
advanced to the stop of the 2d line of the file, and left the pointer positioned at the beginning of the tertiary line. infile.read()
began from this position, and advanced through to the cease of the file.
In full general, if you want to switch between the unlike kinds of read commands, you should shut the file and then open it again to kickoff over.
Reading all of the lines of a file into a listing
infile.readlines()
will read all of the lines into a list, where each line of the file is an item in the list. This is extremely useful, because one time we have read the file in this way, nosotros tin loop through each line of the file and procedure information technology. This approach works well on data files where the data is organized into columns similar to a spreadsheet, because information technology is likely that we will want to handle each line in the same style.
The case below demonstrates this approach:
#Create a variable for the file proper name filename = "Plates_output_simple.csv" #Open the file infile = open up ( filename , 'r' ) lines = infile . readlines () for line in lines : #lines is a list with each item representing a line of the file if 'control' in line : print ( line ) #print lines for control status infile . close () #shut the file when you're done!
Using .split up()
to divide "columns"
Since our information is in a .csv file, we can use the split
control to split up each line of the file into a listing. This can be useful if we want to admission specific columns of the file.
#Create a variable for the file name filename = "Plates_output_simple.csv" #Open the file infile = open ( filename , 'r' ) lines = infile . readlines () for line in lines : sline = line . split ( ',' ) # separates line into a list of items. ',' tells information technology to split the lines at the commas print ( sline ) #each line is at present a list infile . close () #Always close the file!
Consequent names, again
At first glance, the variable name
sline
in the case in a higher place may not make much sense. In fact, we chose it to be an abbreviation for "split line", which exactly describes the contents of the variable.You don't have to utilise this naming convention if yous don't want to, just y'all should piece of work to apply consequent variable names across your code for common operations like this. It will make it much easier to open an old script and speedily empathize exactly what it is doing.
Converting text to numbers
When we called the
readlines()
command in the previous code, Python reads in the contents of the file every bit a cord. If nosotros want our code to recognize something in the file every bit a number, we need to tell information technology this!For example,
float('5.0')
volition tell Python to treat the text string '5.0' as the number 5.0.int(sline[4])
will tell our code to treat the text cord stored in the 5th position of the list sline as an integer (non-decimal) number.For each line in the file, the ColonyCount is stored in the fifth column (index 4 with our 0-based counting).
Change the code above to print the line simply if the ColonyCount is greater than 30.Solution
#Create a variable for the file name filename = 'Plates_output_simple.csv' ##Open up the file infile = open ( filename , 'r' ) lines = infile . readlines () for line in lines [ one :]: #skip the beginning line, which is the header sline = line . divide ( ',' ) # separates line into a list of items. ',' tells information technology to separate the lines at the commas colonyCount = int ( sline [ four ]) #store the colony count for the line every bit an integer if colonyCount > 30 : print ( sline ) #close the file infile . close ()
Writing data out to a file
Often, we will desire to write data to a new file. This is especially useful if we have done a lot of computations or information processing and we want to be able to salvage information technology and come back to it later.
Writing a file is the aforementioned multi-step process
Simply like reading a file, we volition open and write the file in multiple steps.
- Create a variable to hold the proper noun of the file that nosotros want to open. Oftentimes, this volition exist a new file that doesn't nevertheless exist.
- Call a function to open up the file. This time, we will specify that we are opening the file to write into it!
- Write the information into the file. This requires some conscientious attention to formatting.
- When we are washed, we should remember to close the file!
The lawmaking below gives an instance of writing to a file:
filename = "output.txt" #w tells python we are opening the file to write into information technology outfile = open ( filename , 'w' ) outfile . write ( "This is the first line of the file" ) outfile . write ( "This is the second line of the file" ) outfile . close () #Close the file when nosotros're done!
Where did my file end up?
Any time you open a new file and write to it, the file will be saved in your current working directory, unless y'all specified a dissimilar path in the variable filename.
Newline characters
When y'all examine the file you lot just wrote, y'all will come across that all of the text is on the aforementioned line! This is because we must tell Python when to start on a new line by using the special string character '\northward'
. This newline character will tell Python exactly where to start each new line.
The example below demonstrates how to use newline characters:
filename = 'output_newlines.txt' #w tells python nosotros are opening the file to write into it outfile = open ( filename , 'w' ) outfile . write ( "This is the first line of the file \n " ) outfile . write ( "This is the second line of the file \n " ) outfile . close () #Shut the file when we're washed!
Become open the file you but wrote and and check that the lines are spaced correctly.:
Dealing with newline characters when you read a file
You may have noticed in the last file reading example that the printed output included newline characters at the cease of each line of the file:
['colonies02.tif', '2', 'exp', '24', '84', '3.ii', '22\north']
['colonies03.tif', '3', 'exp', '24', '792', '3', '78\north']
['colonies06.tif', '2', 'exp', '48', '85', '5.2', '46\north']We can become rid of these newlines by using the
.strip()
function, which will go rid of newline characters:#Create a variable for the file name filename = 'Plates_output_simple.csv' ##Open the file infile = open up ( filename , 'r' ) lines = infile . readlines () for line in lines [ one :]: #skip the first line, which is the header sline = line . strip () #get rid of trailing newline characters at the end of the line sline = sline . split ( ',' ) # separates line into a list of items. ',' tells it to divide the lines at the commas colonyCount = int ( sline [ 4 ]) #store the colony count for the line as an integer if colonyCount > 30 : print ( sline ) #close the file infile . close ()
Writing numbers to files
Just similar Python automatically reads files in as strings, the write()
function expects to but write strings. If nosotros want to write numbers to a file, we will need to "cast" them as strings using the function str()
.
The lawmaking below shows an example of this:
numbers = range ( 0 , 10 ) filename = "output_numbers.txt" #w tells python nosotros are opening the file to write into it outfile = open ( filename , 'w' ) for number in numbers : outfile . write ( str ( number )) outfile . close () #Close the file when we're washed!
Writing new lines and numbers
Go open and examine the file you but wrote. You will see that all of the numbers are written on the same line.
Modify the code to write each number on its ain line.
Solution
numbers = range ( 0 , ten ) #Create the range of numbers filename = "output_numbers.txt" #provide the file name #open the file in 'write' mode outfile = open ( filename , 'westward' ) for number in numbers : outfile . write ( str ( number ) + ' \n ' ) outfile . shut () #Close the file when nosotros're done!
The file you just wrote should be saved in your Working Directory. Open up the file and check that the output is correctly formatted with one number on each line.
Opening files in dissimilar 'modes'
When we have opened files to read or write data, we have used the office parameter
'r'
or'westward'
to specify which "way" to open the file.
'r'
indicates we are opening the file to read information from information technology.
'due west'
indicates we are opening the file to write data into it.Exist very, very conscientious when opening an existing file in 'w' mode.
'w'
will over-write any data that is already in the file! The overwritten data volition exist lost!If yous want to add together on to what is already in the file (instead of erasing and over-writing it), you tin can open the file in append mode past using the
'a'
parameter instead.
Pulling it all together
Read in the data from the file Plates_output_simple.csv that we take been working with. Write a new csv-formatted file that contains only the rows for control plates.
You will need to do the following steps:
- Open the file.
- Utilize
.readlines()
to create a listing of lines in the file. Then close the file!- Open a file to write your output into.
- Write the header line of the output file.
- Employ a for loop to permit you to loop through each line in the list of lines from the input file.
- For each line, check if the growth status was experimental or control.
- For the control lines, write the line of data to the output file.
- Close the output file when you're done!
Solution
Here'south one manner to do information technology:
#Create a variable for the file name filename = 'Plates_output_simple.csv' ##Open up the file infile = open ( filename , 'r' ) lines = infile . readlines () #We will process the lines of the file later #close the input file infile . close () #Create the file we will write to filename = 'ControlPlatesData.txt' outfile = open ( filename , 'w' ) outfile . write ( lines [ 0 ]) #This volition write the header line of the file for line in lines [ one :]: #skip the first line, which is the header sline = line . divide ( ',' ) # separates line into a list of items. ',' tells it to split up the lines at the commas condition = sline [ ii ] #store the condition for the line equally a string if status == "control" : outfile . write ( line ) #The variable line is already formatted correctly! outfile . close () #Close the file when we're washed!
Challenge Problem
Open and read in the data from Plates_output_simple.csv. Write a new csv-formatted file that contains only the rows for the control status and includes only the columns for Fourth dimension, colonyCount, avgColonySize, and percentColonyArea. Hint: you tin can apply the .bring together() office to join a list of items into a string.
names = [ 'Erin' , 'Mark' , 'Tessa' ] nameString = ', ' . join ( names ) #the ', ' tells Python to join the list with each item separated past a comma + space print ( nameString )
'Erin, Mark, Tessa'
Solution
#Create a variable for the input file name filename = 'Plates_output_simple.csv' ##Open the file infile = open ( filename , 'r' ) lines = infile . readlines () #We will procedure the lines of the file later #close the file infile . close () # Create the file we will write to filename = 'ControlPlatesData_Reduced.txt' outfile = open up ( filename , 'due west' ) #Write the header line headerList = lines [ 0 ] . split ( ',' )[ three :] #This will return the list of cavalcade headers from 'time' on headerString = ',' . join ( headerList ) #join the items in the listing with commas outfile . write ( headerString ) #There is already a newline at the terminate, so no demand to add together i #Write the remaining lines for line in lines [ 1 :]: #skip the first line, which is the header sline = line . split ( ',' ) # separates line into a listing of items. ',' tells it to separate the lines at the commas condition = sline [ 2 ] #store the colony count for the line equally an integer if status == "control" : dataList = sline [ 3 :] dataString = ',' . bring together ( dataList ) outfile . write ( dataString ) #The variable line is already formatted correctly! outfile . close () #Close the file when nosotros're washed!
Key Points
Opening and reading a file is a multistep process: Defining the filename, opening the file, and reading the data
Data stored in files can be read in using a multifariousness of commands
Writing data to a file requires attention to information types and formatting that isn't necessary with a
print()
statement
dickersonwher1959.blogspot.com
Source: https://eldoyle.github.io/PythonIntro/08-ReadingandWritingTextFiles/
0 Response to "Telling Python to Read a String Within a Cell"
Postar um comentário