Python Read File Line by Line (with Examples)

Python provides several ways to read a text file and process its lines sequentially or randomly. The correct method to use depends on the size of the file (large or small) and the ease of desired syntax. In this Python tutorial, we will discuss different approaches based on …

Python

Python provides several ways to read a text file and process its lines sequentially or randomly. The correct method to use depends on the size of the file (large or small) and the ease of desired syntax.

In this Python tutorial, we will discuss different approaches based on their clean syntaxes, ability to read large files, and memory efficiency.

For extremely large files, consider using generators or memory-mapped files to optimize memory usage.

1. Methods to Read a Small File

The following methods are suited for reading a small text file in Python because they read the entire file’s content in a single statement. This can have an adverse effect in the case of large files, largely depending on physical memory availability on the machine.

Method NameBenefit/LimitationSyntax
for loopFamiliar syntax. Reads the entire file into memory and iterates over each line. Suitable for small files.for line in file:
readlines()Reads the entire file into memory. Not suitable for large files. Ideal for reading an entire small file in memory.lines = file.readlines()
List comprehensionConcise but hard-to-read syntax. Reads the entire file into memory. Suitable for small files.lines = [line.strip() for line in file]

1.1. Using For Loop

The for loop is the most straightforward and efficient method for reading a text file line by line.

In the following example:

  • The open() function opens the file named ‘data.txt‘ in read mode.
  • The for loop iterates over each line in the file, and in each iteration, the current line is assigned to the variable line.
  • When dealing with files in a specific encoding (e.g., UTF-8), we can specify it using the encoding parameter of the open() function.

Inside the for loop, we can perform any desired operations on the current line.

Do not forget to use line.strip() to remove any trailing whitespace from the line before using it.

try:
    with open('data.txt', 'r', encoding='utf-8') as file:
        for line in file:
            print(line.strip())
except FileNotFoundError:
    print("File not found.")

1.2. Using readlines()

The readlines() method reads all the lines of the file at once and returns them as a list of strings. Be careful because this method reads the entire file into memory at once. Reading all lines at once allows to perform various operations on the list of lines, such as filtering, sorting, or modifying the data.

In the following example:

  • The open() function opens the file named data.txt in read mode ('r').
  • The readlines() method reads all the lines of the file and returns them as a list of strings.
  • Next, we can use a loop to iterate over the lines and process them sequentially.
try:
    with open('data.txt', 'r', encoding='utf-8') as file:
        lines = file.readlines()
        for line in lines:
            print(line.strip())
except FileNotFoundError:
    print("File not found.")

1.3. List Comprehension

List comprehension is a concise syntax and technique for creating lists in Python. We can use it to read a file line by line and perform operations on each line. This method does not provide any noticeable benefit over the previous two methods, and it should be used only for syntax preferences.

In the following example, list comprehension [line.strip() for line in file] reads each line from the file and creates a list of the lines, with any trailing whitespace stripped.

try:
    with open('data.txt', 'r', encoding='utf-8') as file:
        lines = [line.strip() for line in file]
        for line in lines:
            print(line)
except FileNotFoundError:
    print("File not found.")

2. Methods to Read a Large File

Memory consumption is a significant concern when dealing with large files. Traditional methods of reading entire files into memory can lead to performance and storage issues.

To address these challenges, Python offers the following strategies for reading large files efficiently:

Maps a file directly into memory, allowing random access.Benefit/LimitationSyntax
readline()Allows the lines to be processed sequentially without loading the entire file into memory. line = file.readline()
Generator FunctionComparatively complex syntax. Memory-efficient for large files.def read_lines(file_path):
Memory-Mapped FilesMaps a file directly into memory and allows random access.with open('large_file.txt', 'r+b') as file:

2.1. Using readline() Method

The readline() method reads a single line from a file object and returns it as a string, including any trailing newline character. Generally, we use the while loop to iterate over the file content. The loop continues as long as line is not an empty string (indicating the end of the file).

Internally, a file pointer is created when we open a file. When we call readline(), the file pointer moves to the next newline character in the file. The text from the beginning of the file to that point is read and returned as a string.

Subsequent calls to readline() will continue reading from the position where the previous read left off. This makes readline() a suitable method for reading large files line by line without loading the entire file into memory at once.

try:
    with open('data.txt', 'r') as file:
        line = file.readline()
        while line:
            print(line.strip())
            line = file.readline()
except FileNotFoundError:
    print("Error: File not found.")
except IOError:
    print("Error: An I/O error occurred.")

2.2. Generator Function

In Python, Generators are functions that generate values rather than returning a single value. When the generator function is called, and it encounters the yield keyword, it pauses execution, returns the current value, and saves its state.

The next time the generator is called, it resumes execution from its saved state, continues processing, and yields the next value. This avoids loading the entire file into memory upfront.

The yield keyword within the generator function and the optimized file object are the keys to avoiding loading the entire file into memory.

In the following example, the read_lines() function is a generator function that reads the file line by line using a for loop and yields each line.

def read_lines(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

try:
    for line in read_lines('data.txt'):
        print(line)
except FileNotFoundError:
    print("Error: File not found.")
except IOError:
    print("Error: An I/O error occurred.")

2.3. Memory Mapped Files

Memory-mapped files allow us to map a file directly into the program’s memory space. This allows you to treat the file as a large array of bytes from the program statements. It is handy for large files that need to be accessed frequently or randomly.

The ability to randomly access any part of the file makes this approach more efficient than traditional file I/O operations for large files. But remember to close the memory-mapped file to release system resources.

The following example iterates over the memory-mapped file. It searches for newline characters to identify the end of each line. It then extracts the line and prints it.

import mmap

with open('large_file.txt', 'r+b') as file:
    mapped_file = mmap.mmap(file.fileno(), 0)

    start = 0
    while True:
        end = mapped_file.find(b'\n', start)
        if end == -1:
            break
        line = mapped_file[start:end].decode('utf-8')
        print(line)
        start = end + 1

    mapped_file.close()

3. Summary

As discussed in this tutorial, we should use one of the following methods:

  • readlines(): Read the entire file into memory (suitable for small files).
  • for loop: Iterate over lines (suitable for most use cases).
  • List comprehensions: Concise syntax for creating lists (suitable for simple operations).
  • readline(): Read one line at a time (suitable for sequential processing).
  • Generator-based reading: Yield lines one at a time (suitable for large files).
  • Memory-mapped files: Map files directly into memory (suitable for random access).

Happy Learning !!

Weekly Newsletter

Stay Up-to-Date with Our Weekly Updates. Right into Your Inbox.

Comments

Subscribe
Notify of
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.