Telling Python to Read a String Within a Cell

Reading and Writing Text Files

Overview

Teaching: 60 min
Exercises: thirty min

Questions

  • How tin I read in data that is stored in a file or write information out to a file?

Objectives

  • Be able to open up a file and read in the data stored in that file

  • Understand the divergence between the file name, the opened file object, and the data read in from the file

  • Be able to write output to a text file with simple formatting

Why practice we desire to read and write files?

Being able to open and read in files allows us to piece of work with larger data sets, where it wouldn't be possible to type in each and every value and shop them 1-at-a-time as variables. Writing files allows us to process our data and and so save the output to a file so we tin expect at it later.

Correct now, we will practice working with a comma-delimited text file (.csv) that contains several columns of data. However, what y'all acquire in this lesson can be practical to whatever general text file. In the next lesson, you will learn another way to read and process .csv information.

Paths to files

In club to open up a file, we need to tell Python exactly where the file is located, relative to where Python is currently working (the working directory). In Spyder, we tin can do this by setting our current working directory to the binder where the file is located. Or, when we provide the file proper noun, we can give a complete path to the file.

Lesson Setup

We will work with the practice file Plates_output_simple.csv.

  1. Locate the file Plates_output_simple.csv in the directory dwelling/Desktop/workshops/bash-git-python.
  2. Copy the file to your working directory, dwelling house/Desktop/workshops/YourName.
  3. Make certain that your working directory is also set to the folder habitation/Desktop/workshops/YourName.
  4. Every bit you are working, make certain that you save your file opening script(s) to this directory.

The File Setup

Let's open up and examine the construction of the file Plates_output_simple.csv. If y'all open up the file in a text editor, you will see that the file contains several lines of text.

DataFileRaw

Nonetheless, this is fairly difficult to read. If you open up the file in a spreadsheet programme such as LibreOfficeCalc or Excel, you lot can run into that the file is organized into columns, with each column separated by the commas in the epitome above (hence the file extension .csv, which stands for comma-separated values).

DataFileColumns

The file contains ane header row, followed past eight rows of data. Each row represents a single plate image. If we look at the column headings, nosotros tin see that we accept collected data for each plate:

  • The name of the image from which the data was collected
  • The plate number (there were 4 plates, with each plate imaged at two different time points)
  • The growth condition (either control or experimental)
  • The ascertainment timepoint (either 24 or 48 hours)
  • Colony count for the plate
  • The average colony size for the plate
  • The pct of the plate covered by bacterial colonies

We will read in this data file and and so work to analyze the data.

Opening and reading files is a iii-stride process

Nosotros will open and read the file in three steps.

  1. We volition create a variable to hold the name of the file that we want to open.
  2. We will call a open up to open the file.
  3. We volition call a function to really read the data in the file and store it in a variable and then that we tin can process it.

And then, there'southward i more step to do!

  • When we are washed, nosotros should remember to shut the file!

You can call up of these three steps as being similar to checking out a book from the library. Starting time, you lot have to go to the catalog or database to notice out which book you need (the filename). So, you have to become and get it off the shelf and open up the volume up (the open function). Finally, to gain any information from the volume, you accept to read the words (the read office)!

Here is an instance of opening, reading, and closing a file.

                          #Create a variable for the file proper noun              filename              =              'Plates_output_simple.csv'              #This is but a cord of text              #Open the file              infile              =              open              (              filename              ,              'r'              )              # 'r' says nosotros are opening the file to read, infile is the opened file object that we volition read from              #Store the information from the file in a variable              data              =              infile              .              read              ()              #Print the data in the file              print              (              information              )              #close the file              infile              .              close              ()                      

One time we accept read the data in the file into our variable data, we can treat information technology like any other variable in our code.

Utilize consequent names to make your code clearer

It is a skilful idea to develop some consequent habits about the way yous open and read files. Using the same (or similar!) variable names each time will make it easier for you to continue track of which variable is the name of the file, which variable is the opened file object, and which variable contains the read-in information.

In these examples, nosotros will use filename for the text string containing the file name, infile for the open file object from which we tin read in data, and data for the variable holding the contents of the file.

Commands for reading in files

In that location are a variety of commands that let us to read in data from files.
infile.read() will read in the entire file as a single cord of text.
infile.readline() will read in one line at a time (each time you call this command, information technology reads in the next line).
infile.readlines() will read all of the lines into a list, where each line of the file is an detail in the list.

Mixing these commands can have some unexpected results.

                          #Create a variable for the file proper noun              filename              =              'Plates_output_simple.csv'              #Open the file              infile              =              open              (              filename              ,              'r'              )              #Print the first two lines of the file              print              (              infile              .              readline              ())              print              (              infile              .              readline              ())              #call infile.read()              print              (              infile              .              read              ())              #close the file              infile              .              close              ()                      

Notice that the infile.read()command started at the third line of the file, where the first ii infile.readline() commands left off.

Think of information technology like this: when the file is opened, a arrow is placed at the top left corner of the file at the beginning of the first line. Whatsoever fourth dimension a read function is called, the cursor or pointer advances from where it already is. The first infile.readline() started at the showtime of the file and advanced to the end of the get-go line. Now, the arrow is positioned at the beginning of the 2d line. The second infile.readline() advanced to the stop of the 2d line of the file, and left the pointer positioned at the beginning of the tertiary line. infile.read() began from this position, and advanced through to the cease of the file.

In full general, if you want to switch between the unlike kinds of read commands, you should shut the file and then open it again to kickoff over.

Reading all of the lines of a file into a listing

infile.readlines() will read all of the lines into a list, where each line of the file is an item in the list. This is extremely useful, because one time we have read the file in this way, nosotros tin loop through each line of the file and procedure information technology. This approach works well on data files where the data is organized into columns similar to a spreadsheet, because information technology is likely that we will want to handle each line in the same style.

The case below demonstrates this approach:

                          #Create a variable for the file proper name              filename              =              "Plates_output_simple.csv"              #Open the file              infile              =              open up              (              filename              ,              'r'              )              lines              =              infile              .              readlines              ()              for              line              in              lines              :              #lines is a list with each item representing a line of the file              if              'control'              in              line              :              print              (              line              )              #print lines for control status              infile              .              close              ()              #shut the file when you're done!                      

Using .split up() to divide "columns"

Since our information is in a .csv file, we can use the split control to split up each line of the file into a listing. This can be useful if we want to admission specific columns of the file.

                          #Create a variable for the file name                            filename              =              "Plates_output_simple.csv"              #Open the file              infile              =              open              (              filename              ,              'r'              )              lines              =              infile              .              readlines              ()              for              line              in              lines              :              sline              =              line              .              split              (              ','              )              # separates line into a list of items.  ',' tells information technology to split the lines at the commas              print              (              sline              )              #each line is at present a list              infile              .              close              ()              #Always close the file!                      

Consequent names, again

At first glance, the variable name sline in the case in a higher place may not make much sense. In fact, we chose it to be an abbreviation for "split line", which exactly describes the contents of the variable.

You don't have to utilise this naming convention if yous don't want to, just y'all should piece of work to apply consequent variable names across your code for common operations like this. It will make it much easier to open an old script and speedily empathize exactly what it is doing.

Converting text to numbers

When we called the readlines() command in the previous code, Python reads in the contents of the file every bit a cord. If nosotros want our code to recognize something in the file every bit a number, we need to tell information technology this!

For example, float('5.0') volition tell Python to treat the text string '5.0' as the number 5.0. int(sline[4]) will tell our code to treat the text cord stored in the 5th position of the list sline as an integer (non-decimal) number.

For each line in the file, the ColonyCount is stored in the fifth column (index 4 with our 0-based counting).
Change the code above to print the line simply if the ColonyCount is greater than 30.

Solution

                                  #Create a variable for the file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open up the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  for                  line                  in                  lines                  [                  one                  :]:                  #skip the beginning line, which is the header                  sline                  =                  line                  .                  divide                  (                  ','                  )                  # separates line into a list of items.  ',' tells information technology to separate the lines at the commas                  colonyCount                  =                  int                  (                  sline                  [                  four                  ])                  #store the colony count for the line every bit an integer                  if                  colonyCount                  >                  30                  :                  print                  (                  sline                  )                  #close the file                  infile                  .                  close                  ()                              

Writing data out to a file

Often, we will desire to write data to a new file. This is especially useful if we have done a lot of computations or information processing and we want to be able to salvage information technology and come back to it later.

Writing a file is the aforementioned multi-step process

Simply like reading a file, we volition open and write the file in multiple steps.

  1. Create a variable to hold the proper noun of the file that nosotros want to open. Oftentimes, this volition exist a new file that doesn't nevertheless exist.
  2. Call a function to open up the file. This time, we will specify that we are opening the file to write into it!
  3. Write the information into the file. This requires some conscientious attention to formatting.
  4. When we are washed, we should remember to close the file!

The lawmaking below gives an instance of writing to a file:

                          filename              =              "output.txt"              #w tells python we are opening the file to write into information technology              outfile              =              open              (              filename              ,              'w'              )              outfile              .              write              (              "This is the first line of the file"              )              outfile              .              write              (              "This is the second line of the file"              )              outfile              .              close              ()              #Close the file when nosotros're done!                      

Where did my file end up?

Any time you open a new file and write to it, the file will be saved in your current working directory, unless y'all specified a dissimilar path in the variable filename.

Newline characters

When y'all examine the file you lot just wrote, y'all will come across that all of the text is on the aforementioned line! This is because we must tell Python when to start on a new line by using the special string character '\northward'. This newline character will tell Python exactly where to start each new line.

The example below demonstrates how to use newline characters:

                          filename              =              'output_newlines.txt'              #w tells python nosotros are opening the file to write into it              outfile              =              open              (              filename              ,              'w'              )              outfile              .              write              (              "This is the first line of the file              \n              "              )              outfile              .              write              (              "This is the second line of the file              \n              "              )              outfile              .              close              ()              #Shut the file when we're washed!                      

Become open the file you but wrote and and check that the lines are spaced correctly.:

Dealing with newline characters when you read a file

You may have noticed in the last file reading example that the printed output included newline characters at the cease of each line of the file:

['colonies02.tif', '2', 'exp', '24', '84', '3.ii', '22\north']
['colonies03.tif', '3', 'exp', '24', '792', '3', '78\north']
['colonies06.tif', '2', 'exp', '48', '85', '5.2', '46\north']

We can become rid of these newlines by using the .strip() function, which will go rid of newline characters:

                              #Create a variable for the file name                filename                =                'Plates_output_simple.csv'                ##Open the file                infile                =                open up                (                filename                ,                'r'                )                lines                =                infile                .                readlines                ()                for                line                in                lines                [                one                :]:                #skip the first line, which is the header                sline                =                line                .                strip                ()                #get rid of trailing newline characters at the end of the line                sline                =                sline                .                split                (                ','                )                # separates line into a list of items.  ',' tells it to divide the lines at the commas                colonyCount                =                int                (                sline                [                4                ])                #store the colony count for the line as an integer                if                colonyCount                >                30                :                print                (                sline                )                #close the file                infile                .                close                ()                          

Writing numbers to files

Just similar Python automatically reads files in as strings, the write()function expects to but write strings. If nosotros want to write numbers to a file, we will need to "cast" them as strings using the function str().

The lawmaking below shows an example of this:

                          numbers              =              range              (              0              ,              10              )              filename              =              "output_numbers.txt"              #w tells python nosotros are opening the file to write into it              outfile              =              open              (              filename              ,              'w'              )              for              number              in              numbers              :              outfile              .              write              (              str              (              number              ))              outfile              .              close              ()              #Close the file when we're washed!                      

Writing new lines and numbers

Go open and examine the file you but wrote. You will see that all of the numbers are written on the same line.

Modify the code to write each number on its ain line.

Solution

                                  numbers                  =                  range                  (                  0                  ,                  ten                  )                  #Create the range of numbers                  filename                  =                  "output_numbers.txt"                  #provide the file name                  #open the file in 'write' mode                  outfile                  =                  open                  (                  filename                  ,                  'westward'                  )                  for                  number                  in                  numbers                  :                  outfile                  .                  write                  (                  str                  (                  number                  )                  +                  '                  \n                  '                  )                  outfile                  .                  shut                  ()                  #Close the file when nosotros're done!                              

The file you just wrote should be saved in your Working Directory. Open up the file and check that the output is correctly formatted with one number on each line.

Opening files in dissimilar 'modes'

When we have opened files to read or write data, we have used the office parameter 'r' or 'westward' to specify which "way" to open the file.
'r' indicates we are opening the file to read information from information technology.
'due west' indicates we are opening the file to write data into it.

Exist very, very conscientious when opening an existing file in 'w' mode.
'w' will over-write any data that is already in the file! The overwritten data volition exist lost!

If yous want to add together on to what is already in the file (instead of erasing and over-writing it), you tin can open the file in append mode past using the 'a' parameter instead.

Pulling it all together

Read in the data from the file Plates_output_simple.csv that we take been working with. Write a new csv-formatted file that contains only the rows for control plates.
You will need to do the following steps:

  1. Open the file.
  2. Utilize .readlines() to create a listing of lines in the file. Then close the file!
  3. Open a file to write your output into.
  4. Write the header line of the output file.
  5. Employ a for loop to permit you to loop through each line in the list of lines from the input file.
  6. For each line, check if the growth status was experimental or control.
  7. For the control lines, write the line of data to the output file.
  8. Close the output file when you're done!

Solution

Here'south one manner to do information technology:

                                  #Create a variable for the file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open up the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  #We will process the lines of the file later                  #close the input file                  infile                  .                  close                  ()                  #Create the file we will write to                  filename                  =                  'ControlPlatesData.txt'                  outfile                  =                  open                  (                  filename                  ,                  'w'                  )                  outfile                  .                  write                  (                  lines                  [                  0                  ])                  #This volition write the header line of the file                                    for                  line                  in                  lines                  [                  one                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  divide                  (                  ','                  )                  # separates line into a list of items.  ',' tells it to split up the lines at the commas                  condition                  =                  sline                  [                  ii                  ]                  #store the condition for the line equally a string                  if                  status                  ==                  "control"                  :                  outfile                  .                  write                  (                  line                  )                  #The variable line is already formatted correctly!                  outfile                  .                  close                  ()                  #Close the file when we're washed!                              

Challenge Problem

Open and read in the data from Plates_output_simple.csv. Write a new csv-formatted file that contains only the rows for the control status and includes only the columns for Fourth dimension, colonyCount, avgColonySize, and percentColonyArea. Hint: you tin can apply the .bring together() office to join a list of items into a string.

                              names                =                [                'Erin'                ,                'Mark'                ,                'Tessa'                ]                nameString                =                ', '                .                join                (                names                )                #the ', ' tells Python to join the list with each item separated past a comma + space                print                (                nameString                )                          

'Erin, Mark, Tessa'

Solution

                                  #Create a variable for the input file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  #We will procedure the lines of the file later                  #close the file                  infile                  .                  close                  ()                  # Create the file we will write to                  filename                  =                  'ControlPlatesData_Reduced.txt'                  outfile                  =                  open up                  (                  filename                  ,                  'due west'                  )                  #Write the header line                  headerList                  =                  lines                  [                  0                  ]                  .                  split                  (                  ','                  )[                  three                  :]                  #This will return the list of cavalcade headers from 'time' on                  headerString                  =                  ','                  .                  join                  (                  headerList                  )                  #join the items in the listing with commas                  outfile                  .                  write                  (                  headerString                  )                  #There is already a newline at the terminate, so no demand to add together i                  #Write the remaining lines                  for                  line                  in                  lines                  [                  1                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  split                  (                  ','                  )                  # separates line into a listing of items.  ',' tells it to separate the lines at the commas                  condition                  =                  sline                  [                  2                  ]                  #store the colony count for the line equally an integer                  if                  status                  ==                  "control"                  :                  dataList                  =                  sline                  [                  3                  :]                  dataString                  =                  ','                  .                  bring together                  (                  dataList                  )                  outfile                  .                  write                  (                  dataString                  )                  #The variable line is already formatted correctly!                  outfile                  .                  close                  ()                  #Close the file when nosotros're washed!                              

Key Points

  • Opening and reading a file is a multistep process: Defining the filename, opening the file, and reading the data

  • Data stored in files can be read in using a multifariousness of commands

  • Writing data to a file requires attention to information types and formatting that isn't necessary with a print() statement

dickersonwher1959.blogspot.com

Source: https://eldoyle.github.io/PythonIntro/08-ReadingandWritingTextFiles/

0 Response to "Telling Python to Read a String Within a Cell"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel