Reading files in C++ using ifstream: dealing correctly with badbit, failbit, eofbit, and perror()

Because of this POV-Ray issue, I wanted to figure out what is the most secure and stable way to read a file line by line in C++ using a std::ifstream in combination with std::getline(). Furthermore, it was the aim to provide as precise error messages as possible. It turned out that dealing with the error bits eofbit, failbit, and badbit is already challenging, as discussed for e.g. here, here, and here, and finally at cplusplus.com. Even in the latter case you are not provided with all details and the optimal solution. Regarding error messages, it is getting more complicated. Evaluating errno, respectively perror(), at the same time as the error bits is not trivial as can already be inferred from discussions like this and this. But after some testing and carrying together all available information it turns out that there are good recipes to follow.

Update (July, 7th, 2011): I revised the whole article due to an important insight provided by Alexandre Duret-Lutz (confer comments).

If you just want to use/see the results of the investigation, scroll down to the ideal solutions. Otherwise, you should make yourself briefly familiar with eofbit, failbit, badbit of the ios class.

All the code shown in this post can be downloaded.

What we want and have to do: obeying the two rules of ifstream iteration (ideal solution snippets)

We want to iteratively process the lines read from a file by means of an ifstream (reasons here). Therefore, we try to open a file by invoking ifstream s ("file"). To try to get a line from the file, we use std::getline(s, line), where line is a std::string to store the data to. In a loop we want to process the data read from the file, line by line, via process(line). Of course, we only want to call process(line), if the last getline() stored meaningful data in line. We also want to get the last line of the file, even if it is not terminated by a newline character.

After the investigation described below, I am pretty sure that the simplest rock-solid language construct for this task is:

string line;
ifstream f ("file");
while(getline(f, line)) {
    process(&line);
    }

This is so simple and at the same time good, because it is the shortest approach following the two basic rules we must follow when using std::getline() (or any other IO operation):

  • Before processing data obtained from the stream, check for errors reported by getline() (or other IO operations).
  • If getline() (or …) has set failbit or badbit, do not process the data. eofbit is not required to be checked in the loop and does not necessarily have to prevent data processing.

The origin of these rules will become clearer while reading the rest of the article.

Only two rules, to follow — isn’t that easy? Anyway, this often is not done, as you can infer from the links in the introduction. In fact, not following these rules lead to the bug in POV-Ray also mentioned in the introduction.

How does the simple code snippet above follow these rules? The loop, in fact, at first tries to obtain data from the stream via IO operation getline(). It is totally okay to try this even on a bad/empty/non-existing file, because it just tries and afterwards sets the stream’s error bits correctly, as defined here. After getline(), failbit and badbit are checked via the ifstream’s bool operator (this works because getline() returns the stream object). Only if none of these bits are set you can be sure that there is data in line. In this case the loop body is evaluated. It processes the data obtained from the stream. Then, it is tried to read the next line… and so on. The point is that for each iteration the chronological order of IO operation, error check, and data processing is correct. If you wonder why we do not have to check the eofbit within the loop: this will be answered during the discussion below.

Now, what happens if the file is empty? Or if it does not even exist? If it is a directory? If we are not allowed to access it? The code snippet above can deal with all types of errors transparently. That is very good so far. But what to do if we want to additionally get some meaningful error messages? This is the best we can do:

string line;
ifstream f ("file");
if (!f.is_open())
    perror("error while opening file");
while(getline(f, line)) {
    process(&line);
    }
if (f.bad())
    perror("error while reading file");

Why? Discussed in the next part.

How to catch errors specifically? Testing ifstream’s behavior

Let me start with

Two important things to know:

  • Consider a call to std::getline() detecting the end of file. It then sets eofbit. But: “Notice that some eofbit cases will also set failbit.” (reference). This will be very important and we will figure out in which cases exactly we have either only eofbit or both, eofbit and failbit set.
  • perror() evaluates the current setting of errno and prints a meaningful error message. errno is a global error variable which is set by low-level functions of your current operating system. This errno setting is sticky: it stays until the next error is happening, overwriting the state of the last error. Therefore, perror() must only be called in a context that for sure has updated errno right before. Otherwise, the printed error message may not make any sense at all in the current context.

As you already can imagine, for providing meaningful error messages, it is required to understand when exactly the eof, fail and badbit are set. Also, one has to know when exactly it is safe to call perror() in the context of stream methods. Unfortunately, at this point we enter system-dependecy and the documentations are bad. Hence, I implemented a test to see how my test system behaves, which is a 2.6.27 Linux. I will provide all source files and it will be very easy for you to reproduce this test on your system.

The test:

The test can be summarized as follows:
Consider ifstream s ("file"). It checks the state of

  • s.is_open()
  • s.fail() (same as !s: check for failbit and badbit)
  • s.bad() (check for only badbit)
  • s.eof() (check for only eofbit)
  • errno (evaluated via perror())

while opening/reading

  • a non-existent file
  • an empty file
  • an existing file with content
  • an existing file with the last line not ended by a newline character (can be considered as invalid file format, since lines are mostly considered to be newline-terminated, not newline-separated).
  • a file with content that is opened by another process for reading
  • a file with content that is opened by another process for writing
  • a file that the test program has no access to
  • a directory

Basically, the test evaluates the named quantities at all interesting points and especially after calls to std::getline().

Technically, the test consists of:

  • The C++ source of a test program with a lot of debug output expecting an input filename as first commandline argument.
  • A shellscript that is compiling the C++ source code of the test program and setting up the test files for the test. It then runs the test program for all different input filenames.

This is the source of the shellscript readfile_tests.sh:

#!/bin/bash
 
COMPILATION_SOURCE=$1
NE_FILE="na"
EMPTY_FILE="empty_file"
ONE_LINE_FILE="one_line_file"
INVALID_LINE_FILE="invalid_line_file"
FILE_READ="file_read"
FILE_WRITTEN="file_written"
FILE_DENIED="/root/.bashrc"
DIR="dir"
 
# compile test program, resulting in a.out executable
g++ $COMPILATION_SOURCE
 
# create test files / directories and put them in the desired state
touch $EMPTY_FILE
if [[ ! -d $DIR ]]; then
    mkdir $DIR
fi
echo "rofl" > $ONE_LINE_FILE
echo -ne "validline\ninvalidline" > $INVALID_LINE_FILE
echo "i am opened to read from" > $FILE_READ
python -c 'import time; f = open("'$FILE_READ'"); time.sleep(4)' &
echo "i am opened to write to" > $FILE_WRITTEN
python -c 'import time; f = open("'$FILE_WRITTEN'", "a"); time.sleep(4)' &
 
# execute test cases
echo "******** testing on non-existent file.."
./a.out $NE_FILE
echo
echo "******** testing on empty file.."
./a.out $EMPTY_FILE
echo
echo "******** testing on valid file with one line content"
./a.out $ONE_LINE_FILE
echo
echo "******** testing on a file with one valid and one invalid line"
./a.out $INVALID_LINE_FILE
echo
echo "******** testing on a file that is read by another process"
./a.out $FILE_READ
echo
echo "******** testing on a file that is written to by another process"
./a.out $FILE_WRITTEN
echo
echo "******** testing on a /root/.bashrc (access should be denied)"
./a.out $FILE_DENIED
echo
echo "******** testing on a directory"
./a.out $DIR

This is the source of the C++ program readfile_debug.cpp:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
int check_error_bits(ifstream* f) {
    int stop = 0;
    if (f->eof()) {
        perror("stream eofbit. error state");
        // EOF after std::getline() is not the criterion to stop processing
        // data: In case there is data between the last delimiter and EOF,
        // getline() extracts it and sets the eofbit.
        stop = 0;
        }
    if (f->fail()) {
        perror("stream failbit (or badbit). error state");
        stop = 1;
        }
    if (f->bad()) {
        perror("stream badbit. error state");
        stop = 1;
        }
    return stop;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    int getlinecount = 1;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    cout << "* trying to open and read: " << argv[1] << endl;
    ifstream f (argv[1]);
    perror("error state after ifstream constructor");
    if (!f.is_open())
        perror("is_open() returned false. error state");
    else
        cout << "is_open() returned true." << endl;
    cout << "* checking error bits once before first getline" << endl;
    check_error_bits(&f);
    while(1) {
        cout << "* perform getline() # " << getlinecount << endl;
        getline(f, line);
        cout << "* checking error bits after getline" << endl;
        if (check_error_bits(&f)) {
            cout << "* skip operation on data, break loop" << endl;
            break;
            }
        // This is the actual operation on the data obtained and we want to
        // protect it from errors during the last IO operation on the stream
        cout << "data line " << getlinecount << ": " << line << endl;
        getlinecount++;
        }
    f.close();
    return 0;
    }

Let’s run it:

$ ./readfile_tests.sh readfile_debug.cpp

The output:

******** testing on non-existent file..
* trying to open and read: na
error state after ifstream constructor: No such file or directory
is_open() returned false. error state: No such file or directory
* checking error bits once before first getline
stream failbit (or badbit). error state: No such file or directory
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: No such file or directory
* skip operation on data, break loop
 
******** testing on empty file..
* trying to open and read: empty_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: rofl
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: validline
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
data line 2: invalidline
* perform getline() # 3
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file that is read by another process
* trying to open and read: file_read
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: i am opened to read from
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: i am opened to write to
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
error state after ifstream constructor: Permission denied
is_open() returned false. error state: Permission denied
* checking error bits once before first getline
stream failbit (or badbit). error state: Permission denied
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: Permission denied
* skip operation on data, break loop
 
******** testing on a directory
* trying to open and read: dir
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: Is a directory
stream badbit. error state: Is a directory
* skip operation on data, break loop

The test results (important things to know 2):

There are many things to learn from this output. The following conclusions are only a subset. All this makes makes a lot of sense:

  • The ifstream s ("file") constructor sets errno in case of a non existing file.
  • is_open() does not set errno.
  • is_open() does not catch the case when trying to open a directory.
  • is_open() only catches the non-existing-file-case.

-> is_open() after ifstream constructor in combination with perror() works, but only for catching this one case: non existing file.

  • getline() does only change errno in case of trying to read from a directory. In all other test cases it does not change errno.
  • In almost any case, the eofbit hast been set at the same time as the failbit (which may happen as stated above). A closer look reveals that the failbit is only set by getline() if it did not manage to extract any data at all. The eofbit on the other hand means that getline() reached EOF while searching the next line delimiter: If there is data between the last delimiter and EOF, getline() extracts this data and sets eofbit.
  • The badbit was only set for the directory case.

-> !s or s.fail() (equivalent, both are sensitive to badbit and failbit), must not be used in combination with perror(), because it may print error messages which are wrong in the current context.

-> Only a badbit check and therefore s.bad() seems to be safe to use together with perror().

-> In order to process residual data between the last line delimiter and EOF, a positive eofbit must not prevent data processing.

Ideal solutions

With the knowledge from above, ideal code solutions in form of ready-to-compile-examples can be proposed for two cases:

  • one, in which error messages are not important
  • one, in which we do the best we can to extract error messages

Ideal solution including meaningful error messages

Basically, the result from the above’s test is that with C++’s standard means there is not so much we can do to catch specific errors. The following source of readfile_stable_errors.cpp seems to be the best:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
void process(string* line) {
    cout << "line read: " << *line << endl;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    string filename(argv[1]);
    cout << "* trying to open and read: " << filename << endl;
    ifstream f (argv[1]);
    // After this opening attempt, we can safely use perror() in case 
    // f.is_open() returns false.
    if (!f.is_open())
        perror(("error while opening file " + filename).c_str());
    // Now read the file via std::getline, while trying to provide as
    // meaningful error messages as possible. Rules obeyed:
    //   - first the IO operation, then error check, then data processing
    //   - failbit and badbit prevent data processing, eofbit does not
    while(getline(f, line)) {
        process(&line);
        }
    // Only in case of the badbit, we can assume that some lower
    // level system API function has set errno. Then, perror() can be used.
    if (f.bad())
        perror(("error while reading file " + filename).c_str());
    f.close();
    return 0;
    }

Of course this can be run against the test shellscript from above:

./readfile_tests.sh readfile_stable_errors.cpp

The output:

******** testing on non-existent file..
* trying to open and read: na
error while opening file na: No such file or directory
 
******** testing on empty file..
* trying to open and read: empty_file
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
line read: rofl
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
line read: validline
line read: invalidline
 
******** testing on a file that is read by another process
* trying to open and read: file_read
line read: i am opened to read from
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
line read: i am opened to write to
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
error while opening file /root/.bashrc: Permission denied
 
******** testing on a directory
* trying to open and read: dir
error while reading file dir: Is a directory

Congratulations, “no such file or directory”, “is a directory”, and “permission denied” are catched. Also, the data in the “invalid” line was read.

Ideal solution without printing error messages:

The following source of readfile_stable_no_errors.cpp deals with all errors transparently and extracts residual data from an “invalid” last line:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
void process(string* line) {
    cout << "line read: " << *line << endl;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    cout << "* trying to open and read: " << argv[1] << endl;
    ifstream f (argv[1]);
    // Note that we can omit checking for f.is_open(), because
    // all errors will be catched correctly by f.fail() (!f) and
    // we do not want to print error messages here.
    // Also note that during the loop, the following rules are obeyed:
    //   - first the IO operation, then error check, then data processing
    //   - failbit and badbit prevent data processing, eofbit does not
    while(getline(f, line)) {
        process(&line);
        }
    f.close();
    return 0;
    }

The test:

./readfile_tests.sh readfile_stable_no_errors.cpp

Output:

******** testing on non-existent file..
* trying to open and read: na
 
******** testing on empty file..
* trying to open and read: empty_file
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
line read: rofl
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
line read: validline
line read: invalidline
 
******** testing on a file that is read by another process
* trying to open and read: file_read
line read: i am opened to read from
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
line read: i am opened to write to
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
 
******** testing on a directory
* trying to open and read: dir

The intention of this program is to never fail due to stream opening/IO errors. This succeeds: whenever there is data to extract, it extracts data. All other cases are transparently handled correctly.

Remember, all code shown here can be downloaded.

I hope that this article helps someone. Please let me know if I have to correct some things and if we can do better (thanks again to Alexandre at this point).

13 comments to Reading files in C++ using ifstream: dealing correctly with badbit, failbit, eofbit, and perror()

  • Jim Holsenback

    Excellent write up! Thanks for consolidating the discussions and references. As it turns out I was asked to learn about this process for a non POV-Ray project as well, so I’ve personally benefited from this as well.

  • The problem with the

    while(getline(f, line).good()) process(&line);

    approach is that it silently ignores the last line of a text file that is missing the final newline.

    Many tools (e.g., g++, diff) emit a “missing new line at EOF” warning, or process the line as if the newline was present. I’d consider silently ignoring such a line as a bug.

    • Dear Alexandre,

      first of all, thank you for your very valuable comment. I can confirm this behavior. Good text editors always append a trailing new line. Users that create files programmatically know that they should append a trailing new line. But I agree with you — one must not rely on that.

      Now, what can we do? The problem, again: std::getline() extracts data from a stream up to the next delimiter, which is '\n' by default. If there is no trailing \n at the end of the file, getline() does not extract the data between the last '\n' and EOF. On my system, the behavior with respect to a valid file is as follows: assume file content "foo\n". First getline() extracts "foo" and does not set any error bit. Second getline() extracts nothing and sets failbit and eofbit. Now assume an invalid file content "foo\nbar". Exactly the same behavior (and "bar" is lost).

      We have to find a way to know if there was data between the last delimiter and EOF. Then, we could a) print a warning or b) try to get this data. In my test cases above, I have shown that, at least for my system, eofbit always also goes with failbit while reading a file using std::getline(). Hence, I see no chance to use the error bits of the stream to see if there was additional data after the last delimiter. Furthermore, if there was a way to detect the existence of residual data, how should we extract it after getline() has already invalidated the stream?

      The conclusion is, that when using std::getline(), there is no way to 1) detect and 2) extract residual data. Is that correct so far?

      In case the conclusion above is correct: we have to evaluate the ifstream on a character basis and implement our own getline() version in order to be able to detect and extract data between the last \n and EOF. Right?

      I hope to get a comment from you again :-)

  • std::getline(f, line) extracts data up to either ‘\n’, or EOF, or failure. So it will read a last line missing a new line. In this case this will set the eofbit, but not failbit: this is not a failure. If you do not want to ignore that line, do not consider eofbit to be a failure (i.e., do not check the stream’s state with good()).

    The failbit will be set when getline() extracts no characters. This occurs for instance when you try to read another
    line while you are already at EOF.

    Therefore simply writing while(std::getline(f, line)) process(line); should process all lines, including a last line with a missing new line. Personally I’m not really interested in diagnosing these (I don’t see why I’d bother my users), I just want to process these lines as if the newline was there.

    I would check for badbit right after the loop to diagnose low-level errors.

    ifstream f("file");
    string line;
    while(getline(f, line))
    {
      process(line);
    }
    if (f.bad())
    {
      // report low-level error
    }
    f.close()
    
    • Thanks Alexendre for clarifying this.

      Can we rely on the failbit behavior you described on all OSs? The writeups on cplusplus.com are not good, they do not explain the case you described explicitly and they suggest using .good() in the while’s conditional statement. Furthermore, I must have missed something while evaluating the test results on a file like “foo\nbar”… I was quite sure that I always noticed the failbit together with eofbit. Now, after a quick test I can confirm the behavior you described.

      I will update the blog post when I find the time.

  • Haoqing

    Thanks so much for the comprehensive guidance. It saved me from losing confidence with the iosteam this afternoon.

  • Excellent summary with links!
    I put the URL of your page into a comment in my code. ☺

  • PowerGamer

    Don’t know about Linux but on Windows getline() (or to be more precise some other internal function that is called by getline()) sets failbit (rather than badbit) on any read error reported by Win32 API ReadFile() call (based on MSVC 2010 C/C++ RTL source code). So here is how to check for errors in ALL cases (i.e. not only when badbit flag is set). Theoretically should work equally well on Linux too since it relys only on what C/C++ language standard and library documentation says.

    errno = 0;
    string line;
    ifstream f(“file”);
    while(getline(f, line))
    process(&line);
    if(errno)
    perror(“Encountered the following error”);

    The code above takes care of both stream opening and reading errors.

    • I verified that your code works on Linux, after including <cerrno>.

      As I understand, you are saying that f.bad() is never true after getline() on Windows. Do I get you correctly? Did you test that?

      • PowerGamer

        getline() sets badbit if it catches any exception during its execution. No such exception is thrown if ReadFile() fails (for any reason). I primarily checked what happens when ReadFile() fails but have not studied all other possible code paths and atm cannot guarantee that badbit will never happen (I might study the sources in more detail later). And yes, I tested it by locking (in another program) a region of bytes in the file that getline() was reading. Also, if you try to open a directory or a non-existant file on Windows you get only failbit (out of three possible bits) right after the f.open() call.

        So just to be safe it might be a good idea to check for badbit also:
        if(errno)
        perror(“Encountered the following error”);
        else if(f.bad())
        cerr << "Encountered unknown error";

  • Thank you for this write up. Extremely helpful, and Very easy to follow and understand.

  • Santosh

    In my c++ code I used is_open(), when I am running this code on Solaris 10 multi-core system I am facing issue. With 14 core I am getting true returned from is_open() and in 15 core machine I am getting false. In the system just increasing core by one, the is_open() function behavior is changed.

Leave a Reply

  

  

  


*

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">