Jan-Philip Gehrcke, PhD

Reading files line by line in C++ using ifstream: dealing correctly with badbit, failbit, eofbit, and perror()

June 25, 2011

Motivated by this POV-Ray issue I tried to find a reliable way to read a file line by line in C++ using std::ifstream in combination with std::getline(). While doing so, the goal was to handle all underlying stream errors as well as file opening errors, and to emit as precise error messages as possible. In a high-level programming language such as Python this level of reliability and usability is not difficult to obtain. However, in C++ this turned out to be a rather complex topic.

Proper handling of the stream error bits eofbit, failbit, and badbit requires a tremendous amount of care, as discussed for example here, here, and here, and finally at cplusplus.com. It is worth mentioning that although cplusplus.com is a convenient reference, it does not provide us with a rock-solid solution for the above-stated problem and also does not mention all the important details.

When it comes to the idea of providing meaningful error messages, things become quite complicated. Proper evaluation of errno, respectively perror(), in response to the stream error bits is not a trivial task as can be inferred from discussions like this and this. From these discussions we learn that most of the related uncertainty comes from a lack of centralized documentation or even missing documentation. The exact behavior of C++ code with respect to file handling and stream manipulation is defined by an intertwining of language specification (C++ in this case), operating system interface (e.g. POSIX) and low-level APIs (provided by e.g. libc) — they all are documented in different places and to a different extent. We for example expect that when fopen() returns NULL, errno is set to something meaningful. But where is this actually documented?

In order to understand the relation between the language and operating system constructs involved, I performed quite some research and testing. Of course there are many obvious and non-obvious ways to write unreliable code. As expected, for writing reliable code, there also best practices or “recipes” to follow. To name, explain, and share those with the community is the goal of this article. We all know that re-using established recipes saves time and improves software quality in the long-term.

Update (January 18th, 2015): In the mean time, this article has made it into the top Google search results for “c++ read file ifstream”. It is one of the most-visited articles on my website. Thanks for commenting and sharing!

Update (July 7th, 2011): I revised the article after an important insight provided by Alexandre Duret-Lutz (confer comments).

If you just want to have a look at the results of this small investigation, I recommend scrolling down to the ideal solutions section. Otherwise, before continuing, you should make yourself briefly familiar with eofbit, failbit, badbit of the ios class.

Note: all code shown in this post is contained in this HG repository, and can also be downloaded in a tarball.

Obey the two rules of ifstream iteration

This is the task: iteratively process the lines read from a file by means of an ifstream (why ifstream?). Therefore, we first try to open a file by invoking ifstream s ("file"). For attempting to get a line from the file, we use std::getline(s, line), where line is a std::string to store the data to. The goal is to process the data read from the file, line by line, via the fictitious call to process(line). Of course, we want to call process(line) only if the preceding getline() was able to extract meaningful data and store it in line. When reaching the end of the file, it usually is the goal to treat the trailing data as a healthy line even if it is not terminated by a newline character (there is no standard saying whether a line is defined by special character separation or termination).

After the investigation described below, I am pretty sure that the simplest rock-solid language construct for above-specified task is:

string line;
ifstream f ("file");
while(getline(f, line)) {
    process(&line);
    }

This is so simple and yet reliable, because it is the shortest approach following the two basic rules we must follow when applying an I/O operation on a stream, as std::getline() is one:

Before processing data obtained from the stream, check for errors reported by getline() (this holds true for any other IO operation on streams).

If getline() (or any other IO operation on a stream) has set the stream’s failbit or badbit, do not process the data. eofbit is not required to be checked in the loop and does not necessarily have to prevent data processing.

The origin of these rules will become clearer while reading the rest of the article.

Only two rules, to follow — isn’t that easy? Anyway, this often is not done, as you can infer from the links in the introduction. In fact, not following these rules lead to the bug in POV-Ray linked to in the very first sentence of this article.

How does the simple code snippet above follow these rules? The loop, in fact, at first tries to obtain data from the stream via IO operation getline(). It is totally okay to try this even on a bad/empty/non-existing file, because it just tries and afterwards sets the stream’s error bits correctly, as defined here. After getline(), failbit and badbit are checked via the ifstream’s bool operator: getline() actually returns the stream object which is evaluated in a bool expression in the loop header. Only if both bits are not set one can be sure that there is meaningful data in line. In this case the loop body is evaluated. It processes the data obtained from the stream. Then, in the next loop iteration, the code attempts to read the next line, followed by error check, … and so on.

The point is: in each iteration, the chronological order of

IO operation,
error check,
and data processing

is preserved.

Do you wonder why we do not need to check the eofbit within the loop? This is answered further below.

Now, how does the code snippet above behave if

the file path is invalid, e.g.
- if the file does not exist or
- if it is a directory or
- if the executing process is not allowed to access the file?
or if the file is empty?

The answer for all of these scenarios: the code just does not enter the loop body. It does not attempt to process data. The code snippet above cannot be surprised. It deals with all types of errors transparently.

Transparent error handling is good. Sometimes, however, meaningful error messages must be emitted. How to do that? According to my findings, the following snippet is the best that can be done:

string line;
ifstream f ("file");
if (!f.is_open())
    perror("error while opening file");
while(getline(f, line)) {
    process(&line);
    }
if (f.bad())
    perror("error while reading file");

Why? Discussed in the next part.

How to catch errors specifically? Testing ifstream’s behavior

Let me start with

Two important things to know:

Consider a call to std::getline() detecting the end of file. It then sets eofbit. But: “Notice that some eofbit cases will also set failbit.” (reference). This will be very important and we will figure out in which cases exactly we have either only eofbit or both, eofbit and failbit set.

perror() evaluates the current setting of errno and prints a meaningful error message. errno is a global error variable which is set by low-level functions of your current operating system. An errno setting is sticky: it stays until the next error is happening, overwriting the state of the last error. Therefore, perror() must only be called in a context that for sure has updated errno right before. Otherwise, the printed error message may not make any sense at all in the current context.

As you already can imagine, for providing meaningful error messages, it is required to understand when exactly the eofbit, failbit and badbit are set. Also, one has to know when exactly it is safe to call perror() in the context of stream methods. Unfortunately, at this point we enter system-dependency and proper documentation is difficult to find or even missing. In order to understand the behavior of my system (a 2.6.27 Linux at the time of writing this article), I went down the empirical path and implemented test cases. All source files are provided in a tarball and it will be very easy for you to run these tests on your system.

The test suite:

The test suite can be summarized as follows:
It starts off with ifstream s ("file") and then checks the state of the stream via

s.is_open()
s.fail() (same as !s: check for failbit and badbit)
s.bad() (check for only badbit)
s.eof() (check for only eofbit)
errno (evaluated via perror())

while opening/reading

a non-existing file
an empty file
an existing file with content
an existing file with the last line not terminated by a newline character (could be considered being an invalid file format, since lines mostly are considered to be newline-terminated, not newline-separated).
a file with content that is opened by another process for reading
a file with content that is opened by another process for writing
a file that the test program has no access to
a directory

Basically, the test evaluates the named quantities at all interesting points and especially after calls to std::getline().

Technically, the test consists of:

The C++ source of a test program with debug output. It expects an input filename as first command line argument.

A bash script that is compiling the C++ source code of the test program and setting up the test files for the test. It runs the compiled test program against various input filenames.

This is the shell script (readfile_tests.sh) (note that this has been written in a quick & dirty fashion):

#!/bin/bash
 
COMPILATION_SOURCE=$1
NE_FILE="na"
EMPTY_FILE="empty_file"
ONE_LINE_FILE="one_line_file"
INVALID_LINE_FILE="invalid_line_file"
FILE_READ="file_read"
FILE_WRITTEN="file_written"
FILE_DENIED="/root/.bashrc"
DIR="dir"
 
# Compile test program, resulting in a.out executable.
g++ $COMPILATION_SOURCE
 
# Create test files / directories and put them in the desired state.
touch $EMPTY_FILE
if [[ ! -d $DIR ]]; then
    mkdir $DIR
fi
echo "rofl" > $ONE_LINE_FILE
echo -ne "validline\ninvalidline" > $INVALID_LINE_FILE
echo "i am opened to read from" > $FILE_READ
python -c 'import time; f = open("'$FILE_READ'"); time.sleep(4)' &
echo "i am opened to write to" > $FILE_WRITTEN
python -c 'import time; f = open("'$FILE_WRITTEN'", "a"); time.sleep(4)' & 
 
# Execute test cases.
echo "******** testing on non-existent file.."
./a.out $NE_FILE
echo
echo "******** testing on empty file.."
./a.out $EMPTY_FILE
echo
echo "******** testing on valid file with one line content"
./a.out $ONE_LINE_FILE
echo
echo "******** testing on a file with one valid and one invalid line"
./a.out $INVALID_LINE_FILE
echo
echo "******** testing on a file that is read by another process"
./a.out $FILE_READ
echo
echo "******** testing on a file that is written to by another process"
./a.out $FILE_WRITTEN
echo
echo "******** testing on a /root/.bashrc (access should be denied)"
./a.out $FILE_DENIED
echo
echo "******** testing on a directory"
./a.out $DIR

This is the source of the C++ program readfile_debug.cpp:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
int check_error_bits(ifstream* f) {
    int stop = 0;
    if (f->eof()) {
        perror("stream eofbit. error state");
        // EOF after std::getline() is not the criterion to stop processing
        // data: In case there is data between the last delimiter and EOF,
        // getline() extracts it and sets the eofbit.
        stop = 0;
        }
    if (f->fail()) {
        perror("stream failbit (or badbit). error state");
        stop = 1;
        }
    if (f->bad()) {
        perror("stream badbit. error state");
        stop = 1;
        }
    return stop;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    int getlinecount = 1;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    cout << "* trying to open and read: " << argv[1] << endl;
    ifstream f (argv[1]);
    perror("error state after ifstream constructor");
    if (!f.is_open())
        perror("is_open() returned false. error state");
    else
        cout << "is_open() returned true." << endl;
    cout << "* checking error bits once before first getline" << endl;
    check_error_bits(&f);
    while(1) {
        cout << "* perform getline() # " << getlinecount << endl;
        getline(f, line);
        cout << "* checking error bits after getline" << endl;
        if (check_error_bits(&f)) {
            cout << "* skip operation on data, break loop" << endl;
            break;
            }
        // This is the actual operation on the data obtained and we want to
        // protect it from errors during the last IO operation on the stream
        cout << "data line " << getlinecount << ": " << line << endl;
        getlinecount++;
        }          
    f.close();
    return 0;
    }

Let’s run it:

$ ./readfile_tests.sh readfile_debug.cpp

The output:

******** testing on non-existent file..
* trying to open and read: na
error state after ifstream constructor: No such file or directory
is_open() returned false. error state: No such file or directory
* checking error bits once before first getline
stream failbit (or badbit). error state: No such file or directory
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: No such file or directory
* skip operation on data, break loop
 
******** testing on empty file..
* trying to open and read: empty_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: rofl
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: validline
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
data line 2: invalidline
* perform getline() # 3
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file that is read by another process
* trying to open and read: file_read
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: i am opened to read from
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: i am opened to write to
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
error state after ifstream constructor: Permission denied
is_open() returned false. error state: Permission denied
* checking error bits once before first getline
stream failbit (or badbit). error state: Permission denied
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: Permission denied
* skip operation on data, break loop
 
******** testing on a directory
* trying to open and read: dir
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: Is a directory
stream badbit. error state: Is a directory
* skip operation on data, break loop

The test results (important things to know: part 2):

There are many things to learn from this output. The following conclusions are only a subset. All this makes makes a lot of sense:

The ifstream s ("file") constructor sets errno in case of a non-existing file.

is_open() does not set errno.

is_open() does not catch the case when trying to open a directory.

is_open() only catches the non-existing-file-case.

Conclusion: perror() right after is_open() right after ifstream construction is safe. According to the test, one single problem may be identified via this method: a non-existing file. Hence, the error message can be made precise.

Other observations:

In almost all test cases, the eofbit has been set at the same time as the failbit (verifying “Notice that some eofbit cases will also set failbit.” as stated above). A closer look reveals that the failbit is only set by getline() if it did not manage to extract any data at all. Note that this is a regular scenario, when the last character in a file is a line delimiter. The eofbit on the other hand means that getline() reached EOF while searching for the next line delimiter: If there is data between the last delimiter and EOF, getline() extracts this data and sets eofbit.

The badbit is only set in case of trying to get a line from a directory.

getline() does only change errno in case of trying to get a line from a directory. In all other error cases it does not change errno.

Conclusion 1: When getline() on stream s has evaluated to False, i.e. !s and s.fail() are True, do not blindly use perror() to print an error message, because it is likely to be wrong in the current context. This is because the bool evaluation of the stream is sensitive to both, badbit or failbit). Since failbit may occur in common cases, it is not qualified for detecting an exceptional state (although its name suggests so). Only a set badbit identifies an exception. Therefore, perror() right after an I/O operation on a stream must be preceded by a positive s.bad() evaluation.

Conclusion 2: In order to process residual data between the last line delimiter and EOF, a positive eofbit must not prevent data processing.

Ideal solutions

With the knowledge from above, ideal code solutions in form of ready-to-compile-examples can be proposed for two cases:

one, in which error messages are not important

one, in which we do the best we can to extract error messages

Ideal solution including meaningful error messages

It was shown that with C++’s standard means it is difficult to catch specific errors. The following readfile_stable_errors.cpp tries to provide as precise error messages as possible:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
void process(string* line) {
    cout << "line read: " << *line << endl;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    if(argc != 2) {
        cerr << "One argument is required." << endl;
        return 1;
        }
    string filename(argv[1]);
    cout << "* trying to open and read: " << filename << endl;
    ifstream f (argv[1]);
    // After this attempt to open a file, we can safely use perror() only  
    // in case f.is_open() returns False.
    if (!f.is_open())
        perror(("error while opening file " + filename).c_str());
    // Read the file via std::getline(). Rules obeyed:
    //   - first the I/O operation, then error check, then data processing
    //   - failbit and badbit prevent data processing, eofbit does not
    while(getline(f, line)) {
        process(&line);
        }
    // Only in case of set badbit we are sure that errno has been set in
    // the current context. Use perror() to print error details.
    if (f.bad())
        perror(("error while reading file " + filename).c_str());
    f.close();
    return 0;
    }

Of course this can be run against the test shellscript from above:

./readfile_tests.sh readfile_stable_errors.cpp

The output:

******** testing on non-existent file..
* trying to open and read: na
error while opening file na: No such file or directory
 
******** testing on empty file..
* trying to open and read: empty_file
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
line read: rofl
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
line read: validline
line read: invalidline
 
******** testing on a file that is read by another process
* trying to open and read: file_read
line read: i am opened to read from
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
line read: i am opened to write to
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
error while opening file /root/.bashrc: Permission denied
 
******** testing on a directory
* trying to open and read: dir
error while reading file dir: Is a directory

Congratulations, “no such file or directory”, “is a directory”, and “permission denied” are catched. Also, the data in the “invalid” line was read.

Ideal solution without printing error messages:

The following source of readfile_stable_no_errors.cpp deals with all errors transparently and extracts residual data from an “invalid” last line:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
void process(string* line) {
    cout << "line read: " << *line << endl;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    cout << "* trying to open and read: " << argv[1] << endl;
    ifstream f (argv[1]);
    // Note that we can omit checking for f.is_open(), because
    // all errors will be catched correctly by f.fail() (!f) and
    // we do not want to print error messages here.
    // Also note that during the loop, the following rules are obeyed:
    //   - first the IO operation, then error check, then data processing
    //   - failbit and badbit prevent data processing, eofbit does not
    while(getline(f, line)) {
        process(&line);
        }
    f.close();
    return 0;
    }

The test:

./readfile_tests.sh readfile_stable_no_errors.cpp

Output:

******** testing on non-existent file..
* trying to open and read: na
 
******** testing on empty file..
* trying to open and read: empty_file
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
line read: rofl
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
line read: validline
line read: invalidline
 
******** testing on a file that is read by another process
* trying to open and read: file_read
line read: i am opened to read from
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
line read: i am opened to write to
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
 
******** testing on a directory
* trying to open and read: dir

The intention of this last piece of code shown is to transparently handle file opening and stream I/O errors. This succeeds: whenever there is data to extract, it is extracted. All error test cases result in no data being processed.

Final words

Remember, all code shown here can be downloaded or cloned from Bitbucket.

Please let me know if I have to correct certain points or if we can do better than with the presented solutions (thanks again to Alexandre at this point).