Reading files line by line in C++ using ifstream: dealing correctly with badbit, failbit, eofbit, and perror()

Motivated by this POV-Ray issue I tried to find a reliable way to read a file line by line in C++ using std::ifstream in combination with std::getline(). While doing so, the goal was to handle all underlying stream errors as well as file opening errors, and to emit as precise error messages as possible. In a high-level programming language such as Python this level of reliability and usability is not difficult to obtain. However, in C++ this turned out to be a rather complex topic.

Proper handling of the stream error bits eofbit, failbit, and badbit requires a tremendous amount of care, as discussed for example here, here, and here, and finally at cplusplus.com. It is worth mentioning that although cplusplus.com is a convenient reference, it does not provide us with a rock-solid solution for the above-stated problem and also does not mention all the important details.

When it comes to the idea of providing meaningful error messages, things become quite complicated. Proper evaluation of errno, respectively perror(), in response to the stream error bits is not a trivial task as can be inferred from discussions like this and this. From these discussions we learn that most of the related uncertainty comes from a lack of centralized documentation or even missing documentation. The exact behavior of C++ code with respect to file handling and stream manipulation is defined by an intertwining of language specification (C++ in this case), operating system interface (e.g. POSIX) and low-level APIs (provided by e.g. libc) — they all are documented in different places and to a different extent. We for example expect that when fopen() returns NULL, errno is set to something meaningful. But where is this actually documented?

In order to understand the relation between the language and operating system constructs involved, I performed quite some research and testing. Of course there are many obvious and non-obvious ways to write unreliable code. As expected, for writing reliable code, there also best practices or “recipes” to follow. To name, explain, and share those with the community is the goal of this article. We all know that re-using established recipes saves time and improves software quality in the long-term.

Update (January 18th, 2015): In the mean time, this article has made it into the top Google search results for “c++ read file ifstream”. It is one of the most-visited articles on my website. Thanks for commenting and sharing!

Update (July 7th, 2011): I revised the article after an important insight provided by Alexandre Duret-Lutz (confer comments).

If you just want to have a look at the results of this small investigation, I recommend scrolling down to the ideal solutions section. Otherwise, before continuing, you should make yourself briefly familiar with eofbit, failbit, badbit of the ios class.

Note: all code shown in this post is contained in this HG repository, and can also be downloaded in a tarball.

Obey the two rules of ifstream iteration

This is the task: iteratively process the lines read from a file by means of an ifstream (why ifstream?). Therefore, we first try to open a file by invoking ifstream s ("file"). For attempting to get a line from the file, we use std::getline(s, line), where line is a std::string to store the data to. The goal is to process the data read from the file, line by line, via the fictitious call to process(line). Of course, we want to call process(line) only if the preceding getline() was able to extract meaningful data and store it in line. When reaching the end of the file, it usually is the goal to treat the trailing data as a healthy line even if it is not terminated by a newline character (there is no standard saying whether a line is defined by special character separation or termination).

After the investigation described below, I am pretty sure that the simplest rock-solid language construct for above-specified task is:

string line;
ifstream f ("file");
while(getline(f, line)) {
    process(&line);
    }

This is so simple and yet reliable, because it is the shortest approach following the two basic rules we must follow when applying an I/O operation on a stream, as std::getline() is one:

  • Before processing data obtained from the stream, check for errors reported by getline() (this holds true for any other IO operation on streams).
  • If getline() (or any other IO operation on a stream) has set the stream’s failbit or badbit, do not process the data. eofbit is not required to be checked in the loop and does not necessarily have to prevent data processing.

The origin of these rules will become clearer while reading the rest of the article.

Only two rules, to follow — isn’t that easy? Anyway, this often is not done, as you can infer from the links in the introduction. In fact, not following these rules lead to the bug in POV-Ray linked to in the very first sentence of this article.

How does the simple code snippet above follow these rules? The loop, in fact, at first tries to obtain data from the stream via IO operation getline(). It is totally okay to try this even on a bad/empty/non-existing file, because it just tries and afterwards sets the stream’s error bits correctly, as defined here. After getline(), failbit and badbit are checked via the ifstream’s bool operator: getline() actually returns the stream object which is evaluated in a bool expression in the loop header. Only if both bits are not set one can be sure that there is meaningful data in line. In this case the loop body is evaluated. It processes the data obtained from the stream. Then, in the next loop iteration, the code attempts to read the next line, followed by error check, … and so on.

The point is: in each iteration, the chronological order of

  1. IO operation,
  2. error check,
  3. and data processing

is preserved.

Do you wonder why we do not need to check the eofbit within the loop? This is answered further below.

Now, how does the code snippet above behave if

  • the file path is invalid, e.g.
    • if the file does not exist or
    • if it is a directory or
    • if the executing process is not allowed to access the file?
  • or if the file is empty?

The answer for all of these scenarios: the code just does not enter the loop body. It does not attempt to process data. The code snippet above cannot be surprised. It deals with all types of errors transparently.

Transparent error handling is good. Sometimes, however, meaningful error messages must be emitted. How to do that? According to my findings, the following snippet is the best that can be done:

string line;
ifstream f ("file");
if (!f.is_open())
    perror("error while opening file");
while(getline(f, line)) {
    process(&line);
    }
if (f.bad())
    perror("error while reading file");

Why? Discussed in the next part.

How to catch errors specifically? Testing ifstream’s behavior

Let me start with

Two important things to know:

  • Consider a call to std::getline() detecting the end of file. It then sets eofbit. But: “Notice that some eofbit cases will also set failbit.” (reference). This will be very important and we will figure out in which cases exactly we have either only eofbit or both, eofbit and failbit set.
  • perror() evaluates the current setting of errno and prints a meaningful error message. errno is a global error variable which is set by low-level functions of your current operating system. An errno setting is sticky: it stays until the next error is happening, overwriting the state of the last error. Therefore, perror() must only be called in a context that for sure has updated errno right before. Otherwise, the printed error message may not make any sense at all in the current context.

As you already can imagine, for providing meaningful error messages, it is required to understand when exactly the eofbit, failbit and badbit are set. Also, one has to know when exactly it is safe to call perror() in the context of stream methods. Unfortunately, at this point we enter system-dependency and proper documentation is difficult to find or even missing. In order to understand the behavior of my system (a 2.6.27 Linux at the time of writing this article), I went down the empirical path and implemented test cases. All source files are provided in a tarball and it will be very easy for you to run these tests on your system.

The test suite:

The test suite can be summarized as follows:
It starts off with ifstream s ("file") and then checks the state of the stream via

  • s.is_open()
  • s.fail() (same as !s: check for failbit and badbit)
  • s.bad() (check for only badbit)
  • s.eof() (check for only eofbit)
  • errno (evaluated via perror())

while opening/reading

  • a non-existing file
  • an empty file
  • an existing file with content
  • an existing file with the last line not terminated by a newline character (could be considered being an invalid file format, since lines mostly are considered to be newline-terminated, not newline-separated).
  • a file with content that is opened by another process for reading
  • a file with content that is opened by another process for writing
  • a file that the test program has no access to
  • a directory

Basically, the test evaluates the named quantities at all interesting points and especially after calls to std::getline().

Technically, the test consists of:

  • The C++ source of a test program with debug output. It expects an input filename as first command line argument.
  • A bash script that is compiling the C++ source code of the test program and setting up the test files for the test. It runs the compiled test program against various input filenames.

This is the shell script (readfile_tests.sh) (note that this has been written in a quick & dirty fashion):

#!/bin/bash
 
COMPILATION_SOURCE=$1
NE_FILE="na"
EMPTY_FILE="empty_file"
ONE_LINE_FILE="one_line_file"
INVALID_LINE_FILE="invalid_line_file"
FILE_READ="file_read"
FILE_WRITTEN="file_written"
FILE_DENIED="/root/.bashrc"
DIR="dir"
 
# Compile test program, resulting in a.out executable.
g++ $COMPILATION_SOURCE
 
# Create test files / directories and put them in the desired state.
touch $EMPTY_FILE
if [[ ! -d $DIR ]]; then
    mkdir $DIR
fi
echo "rofl" > $ONE_LINE_FILE
echo -ne "validline\ninvalidline" > $INVALID_LINE_FILE
echo "i am opened to read from" > $FILE_READ
python -c 'import time; f = open("'$FILE_READ'"); time.sleep(4)' &
echo "i am opened to write to" > $FILE_WRITTEN
python -c 'import time; f = open("'$FILE_WRITTEN'", "a"); time.sleep(4)' & 
 
# Execute test cases.
echo "******** testing on non-existent file.."
./a.out $NE_FILE
echo
echo "******** testing on empty file.."
./a.out $EMPTY_FILE
echo
echo "******** testing on valid file with one line content"
./a.out $ONE_LINE_FILE
echo
echo "******** testing on a file with one valid and one invalid line"
./a.out $INVALID_LINE_FILE
echo
echo "******** testing on a file that is read by another process"
./a.out $FILE_READ
echo
echo "******** testing on a file that is written to by another process"
./a.out $FILE_WRITTEN
echo
echo "******** testing on a /root/.bashrc (access should be denied)"
./a.out $FILE_DENIED
echo
echo "******** testing on a directory"
./a.out $DIR

This is the source of the C++ program readfile_debug.cpp:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
int check_error_bits(ifstream* f) {
    int stop = 0;
    if (f->eof()) {
        perror("stream eofbit. error state");
        // EOF after std::getline() is not the criterion to stop processing
        // data: In case there is data between the last delimiter and EOF,
        // getline() extracts it and sets the eofbit.
        stop = 0;
        }
    if (f->fail()) {
        perror("stream failbit (or badbit). error state");
        stop = 1;
        }
    if (f->bad()) {
        perror("stream badbit. error state");
        stop = 1;
        }
    return stop;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    int getlinecount = 1;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    cout << "* trying to open and read: " << argv[1] << endl;
    ifstream f (argv[1]);
    perror("error state after ifstream constructor");
    if (!f.is_open())
        perror("is_open() returned false. error state");
    else
        cout << "is_open() returned true." << endl;
    cout << "* checking error bits once before first getline" << endl;
    check_error_bits(&f);
    while(1) {
        cout << "* perform getline() # " << getlinecount << endl;
        getline(f, line);
        cout << "* checking error bits after getline" << endl;
        if (check_error_bits(&f)) {
            cout << "* skip operation on data, break loop" << endl;
            break;
            }
        // This is the actual operation on the data obtained and we want to
        // protect it from errors during the last IO operation on the stream
        cout << "data line " << getlinecount << ": " << line << endl;
        getlinecount++;
        }          
    f.close();
    return 0;
    }

Let’s run it:

$ ./readfile_tests.sh readfile_debug.cpp

The output:

******** testing on non-existent file..
* trying to open and read: na
error state after ifstream constructor: No such file or directory
is_open() returned false. error state: No such file or directory
* checking error bits once before first getline
stream failbit (or badbit). error state: No such file or directory
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: No such file or directory
* skip operation on data, break loop
 
******** testing on empty file..
* trying to open and read: empty_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: rofl
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: validline
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
data line 2: invalidline
* perform getline() # 3
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file that is read by another process
* trying to open and read: file_read
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: i am opened to read from
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
data line 1: i am opened to write to
* perform getline() # 2
* checking error bits after getline
stream eofbit. error state: Success
stream failbit (or badbit). error state: Success
* skip operation on data, break loop
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
error state after ifstream constructor: Permission denied
is_open() returned false. error state: Permission denied
* checking error bits once before first getline
stream failbit (or badbit). error state: Permission denied
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: Permission denied
* skip operation on data, break loop
 
******** testing on a directory
* trying to open and read: dir
error state after ifstream constructor: Success
is_open() returned true.
* checking error bits once before first getline
* perform getline() # 1
* checking error bits after getline
stream failbit (or badbit). error state: Is a directory
stream badbit. error state: Is a directory
* skip operation on data, break loop

The test results (important things to know: part 2):

There are many things to learn from this output. The following conclusions are only a subset. All this makes makes a lot of sense:

  • The ifstream s ("file") constructor sets errno in case of a non-existing file.
  • is_open() does not set errno.
  • is_open() does not catch the case when trying to open a directory.
  • is_open() only catches the non-existing-file-case.

Conclusion: perror() right after is_open() right after ifstream construction is safe. According to the test, one single problem may be identified via this method: a non-existing file. Hence, the error message can be made precise.

Other observations:

  • In almost all test cases, the eofbit has been set at the same time as the failbit (verifying “Notice that some eofbit cases will also set failbit.” as stated above). A closer look reveals that the failbit is only set by getline() if it did not manage to extract any data at all. Note that this is a regular scenario, when the last character in a file is a line delimiter. The eofbit on the other hand means that getline() reached EOF while searching for the next line delimiter: If there is data between the last delimiter and EOF, getline() extracts this data and sets eofbit.
  • The badbit is only set in case of trying to get a line from a directory.
  • getline() does only change errno in case of trying to get a line from a directory. In all other error cases it does not change errno.

Conclusion 1: When getline() on stream s has evaluated to False, i.e. !s and s.fail() are True, do not blindly use perror() to print an error message, because it is likely to be wrong in the current context. This is because the bool evaluation of the stream is sensitive to both, badbit or failbit). Since failbit may occur in common cases, it is not qualified for detecting an exceptional state (although its name suggests so). Only a set badbit identifies an exception. Therefore, perror() right after an I/O operation on a stream must be preceded by a positive s.bad() evaluation.

Conclusion 2: In order to process residual data between the last line delimiter and EOF, a positive eofbit must not prevent data processing.

Ideal solutions

With the knowledge from above, ideal code solutions in form of ready-to-compile-examples can be proposed for two cases:

  • one, in which error messages are not important
  • one, in which we do the best we can to extract error messages

Ideal solution including meaningful error messages

It was shown that with C++’s standard means it is difficult to catch specific errors. The following readfile_stable_errors.cpp tries to provide as precise error messages as possible:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
void process(string* line) {
    cout << "line read: " << *line << endl;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    if(argc != 2) {
        cerr << "One argument is required." << endl;
        return 1;
        }
    string filename(argv[1]);
    cout << "* trying to open and read: " << filename << endl;
    ifstream f (argv[1]);
    // After this attempt to open a file, we can safely use perror() only  
    // in case f.is_open() returns False.
    if (!f.is_open())
        perror(("error while opening file " + filename).c_str());
    // Read the file via std::getline(). Rules obeyed:
    //   - first the I/O operation, then error check, then data processing
    //   - failbit and badbit prevent data processing, eofbit does not
    while(getline(f, line)) {
        process(&line);
        }
    // Only in case of set badbit we are sure that errno has been set in
    // the current context. Use perror() to print error details.
    if (f.bad())
        perror(("error while reading file " + filename).c_str());
    f.close();
    return 0;
    }

Of course this can be run against the test shellscript from above:

./readfile_tests.sh readfile_stable_errors.cpp

The output:

******** testing on non-existent file..
* trying to open and read: na
error while opening file na: No such file or directory
 
******** testing on empty file..
* trying to open and read: empty_file
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
line read: rofl
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
line read: validline
line read: invalidline
 
******** testing on a file that is read by another process
* trying to open and read: file_read
line read: i am opened to read from
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
line read: i am opened to write to
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
error while opening file /root/.bashrc: Permission denied
 
******** testing on a directory
* trying to open and read: dir
error while reading file dir: Is a directory

Congratulations, “no such file or directory”, “is a directory”, and “permission denied” are catched. Also, the data in the “invalid” line was read.

Ideal solution without printing error messages:

The following source of readfile_stable_no_errors.cpp deals with all errors transparently and extracts residual data from an “invalid” last line:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
 
void process(string* line) {
    cout << "line read: " << *line << endl;
    }
 
 
int main(int argc, char* argv[]) {
    string line;
    if(argc != 2) {
        cerr << "provide one argument" << endl;
        return 1;
        }
    cout << "* trying to open and read: " << argv[1] << endl;
    ifstream f (argv[1]);
    // Note that we can omit checking for f.is_open(), because
    // all errors will be catched correctly by f.fail() (!f) and
    // we do not want to print error messages here.
    // Also note that during the loop, the following rules are obeyed:
    //   - first the IO operation, then error check, then data processing
    //   - failbit and badbit prevent data processing, eofbit does not
    while(getline(f, line)) {
        process(&line);
        }
    f.close();
    return 0;
    }

The test:

./readfile_tests.sh readfile_stable_no_errors.cpp

Output:

******** testing on non-existent file..
* trying to open and read: na
 
******** testing on empty file..
* trying to open and read: empty_file
 
******** testing on valid file with one line content
* trying to open and read: one_line_file
line read: rofl
 
******** testing on a file with one valid and one invalid line
* trying to open and read: invalid_line_file
line read: validline
line read: invalidline
 
******** testing on a file that is read by another process
* trying to open and read: file_read
line read: i am opened to read from
 
******** testing on a file that is written to by another process
* trying to open and read: file_written
line read: i am opened to write to
 
******** testing on a /root/.bashrc (access should be denied)
* trying to open and read: /root/.bashrc
 
******** testing on a directory
* trying to open and read: dir

The intention of this last piece of code shown is to transparently handle file opening and stream I/O errors. This succeeds: whenever there is data to extract, it is extracted. All error test cases result in no data being processed.

Final words

Remember, all code shown here can be downloaded or cloned from Bitbucket.

Please let me know if I have to correct certain points or if we can do better than with the presented solutions (thanks again to Alexandre at this point).

Leave a Reply

Your email address will not be published. Required fields are marked *

Human? Please fill this out: * Time limit is exhausted. Please reload CAPTCHA.

  1. […] Also read this answer 11085151 which references this article […]

  2. Stu M Avatar
    Stu M

    “We for example expect that when fopen() returns NULL, errno is set to something meaningful. But where is this actually documented?”

    man fopen

    RETURN VALUE
           Upon successful completion fopen(), fdopen() and
           freopen() return a FILE pointer.  Otherwise, NULL
           is returned and errno is set to indicate the error.
    
    1. Jan-Philip Gehrcke Avatar

      Thanks for adding this here. 10 years ago I didn’t know where to look.

  3. Santhi Avatar

    Thanks for posting the useful information indeed.

  4. Anonymous Avatar
    Anonymous

    Hi, there are a number of issues here.
    First, a directory is a file (a binary file), so you can open it just fine – that’s not bizarre.
    To read it you’d need to use the OS’s primitives for reading directories – that’s what “ls” is doing after all (and maybe open it in binary mode? ios::binary).

    You haven’t tried to read lines from a binary file like an executable.

    Your test suite tries to read from an empty file by touching some file name – you should have used /dev/null, as if the file name exists and is not empty, then touching it will not make it empty, so your test will not test what you want.

    You try to create a directory. If the directory name exists and is a normal file, the test script will fail to make it a directory and again your test will not test what you want. Try using mktemp -d instead.

    You use perror to print out what went wrong with the “previous” statement. Perror reads the global variable errno to figure out the problem. This is thread unsafe code. If between failing to open your file and calling perror another thread tries to do something else that sets errno (e.g., write to a file you don’t have permission to write to), then the error message will be different.
    There’s no “previous” statement – in general you should avoid using global variables and functions like perror that use them.

    1. Jan-Philip Gehrcke Avatar

      Thanks for taking time for commenting, but you seem to have missed the point of this article. Also, some of your arguments are plain wrong.

      tl;dr: here is your fish, have at it.

      > Perror reads the global variable errno to figure out
      > the problem. This is thread unsafe code.

      This is ancient. Already in the Single Unix Specification 2 (1997) it was clarified that errno is thread-safe [1]:

      For each thread of a process, the value of errno shall not be affected by function calls or assignments to errno by other threads.

      In POSIX 2001 (Single UNIX Specification version 3) we find [2]:

      All functions defined by this volume of IEEE Std 1003.1-2001 shall be thread-safe, except that the following functions1 need not be thread-safe.

      whereas in the subsequent list perror() is not contained.

      That is, on POSIX-compliant systems errno/perror is thread-safe. Linux complies [3]:

      errno is thread-local; setting it in one thread does not affect its value in any other thread.

      Also on Windows usage of errno is thread-safe: http://stackoverflow.com/a/6413172

      > You haven’t tried to read lines from a binary file like an executable.

      Lines only have a meaning in “text mode”. For reading chunks of data from a binary stream separated by a certain byte separator you do not want to use getline(). This article is about getline(), as stated in the first paragraph.

      > a directory is a file (a binary file),
      > so you can open it just fine – that’s not bizarre.

      I am pretty sure that I did not classify reading a directory as bizarre. Did I? I agree, this is normal procedure. However, the question in this article is: if a user of a software *should* provide a path to a file but instead provides a path to a directory, can we provide a meaningful error message?

      > Your test suite tries to read from an empty file by
      > touching some file name – you should have used /dev/null,
      > as if the file name exists and is not empty, then touching
      > it will not make it empty, so your test will not test what you
      > want.

      You misunderstood the purpose of the shown test code (readfile_tests.sh) and ignored the resulting boundary conditions. The script obviously is example code, quick & dirty one. I show it here for disclosing the method by which I came to my conclusions. This is not a serious software project. This is not code meant to be used in production. Also, it should be obvious that the script is meant to be executed in an empty directory. Given these boundary conditions, your hypothetical scenario becomes irrelevant. In case you did execute the test script in one of your existing non-empty directories then I am sorry for your carelessness. If in addition your directory was containing a file called empty_file that actually had contents in it, then, indeed, the test script gave wrong results. Congratulations!

      > You try to create a directory. If the directory name exists
      > and is a normal file, the test script will fail to make it a
      > directory and again your test will not test what you want. Try
      > using mktemp -d instead.

      In lines of how I addressed your earlier point: this is random code from the Internet. You do not execute this in /root or in /home/manfred, unless you want to potentially lose your data. I am sure you are wise enough to create a new directory and run this test script there, after having looked at its source where you made sure that this script does not attempt to execute rm -rf /.

      [1] ISO/IEC 9945:1-1996, §2.4
      [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html#tag_02_09_01
      [3] http://linux.die.net/man/3/errno

  5. PerthCharles Avatar
    PerthCharles

    It’s really helpful. Thanks for your excellent discussion about ifstream.

  6. Vinay Avatar
    Vinay

    Hi Jan-Philip Gehrcke,

    When I tried using the getline function as told by you, I am getting the following error.

    Error: Could not find a match for std::getline(std::ifstream, String)

    My code is as below:

    #include 
    #include 
    
    ifstream inActive;
    inActive.open(file);
    
    String finalLine;
    
    while (getline(inActive, finalLine))
    {
    	cout << finalLine << endl;
    }
    
    inActive.close();
    

    Any idea on this?

    1. Jan-Philip Gehrcke Avatar

      Why don’t you, for starters, try if any of my examples work with your compiler (I’ve been using GCC on Linux, what are you using?)?

      You should include the string header and then initialize finalLine as string (note the lower case ‘s’).

  7. Pradeep Avatar
    Pradeep

    Does your explanation work for write operations also.?

    1. Jan-Philip Gehrcke Avatar

      In principle, all arguments and conclusions should be valid/applicable when you replace getline() with any other I/O operation on streams of the ios class.

  8. Santosh Avatar
    Santosh

    In my c++ code I used is_open(), when I am running this code on Solaris 10 multi-core system I am facing issue. With 14 core I am getting true returned from is_open() and in 15 core machine I am getting false. In the system just increasing core by one, the is_open() function behavior is changed.

    1. Jan-Philip Gehrcke Avatar

      Most likely the number of cores in your machine is not the only parameter that differs between the two test cases.

  9. ackit Avatar

    Thank you for this write up. Extremely helpful, and Very easy to follow and understand.

  10. PowerGamer Avatar
    PowerGamer

    Don’t know about Linux but on Windows getline() (or to be more precise some other internal function that is called by getline()) sets failbit (rather than badbit) on any read error reported by Win32 API ReadFile() call (based on MSVC 2010 C/C++ RTL source code). So here is how to check for errors in ALL cases (i.e. not only when badbit flag is set). Theoretically should work equally well on Linux too since it relys only on what C/C++ language standard and library documentation says.

    errno = 0;
    string line;
    ifstream f("file");
    while(getline(f, line))
        process(&line);
    if(errno)
        perror("Encountered the following error");
    

    The code above takes care of both stream opening and reading errors.

    1. Jan-Philip Gehrcke Avatar

      I verified that your code works on Linux, after including .

      As I understand, you are saying that f.bad() is never true after getline() on Windows. Do I get you correctly? Did you test that?

      1. PowerGamer Avatar
        PowerGamer

        getline() sets badbit if it catches any exception during its execution. No such exception is thrown if ReadFile() fails (for any reason). I primarily checked what happens when ReadFile() fails but have not studied all other possible code paths and atm cannot guarantee that badbit will never happen (I might study the sources in more detail later). And yes, I tested it by locking (in another program) a region of bytes in the file that getline() was reading. Also, if you try to open a directory or a non-existant file on Windows you get only failbit (out of three possible bits) right after the f.open() call.

        So just to be safe it might be a good idea to check for badbit also:

        if(errno)
         perror("Encountered the following error");
        else if(f.bad())
         cerr << "Encountered unknown error";
        
  11. Bernhard Bodenstorfer Avatar

    Excellent summary with links!
    I put the URL of your page into a comment in my code. ☺

  12. Haoqing Avatar
    Haoqing

    Thanks so much for the comprehensive guidance. It saved me from losing confidence with the iosteam this afternoon.

  13. Alexandre Duret-Lutz Avatar

    std::getline(f, line) extracts data up to either ‘\n’, or EOF, or failure. So it will read a last line missing a new line. In this case this will set the eofbit, but not failbit: this is not a failure. If you do not want to ignore that line, do not consider eofbit to be a failure (i.e., do not check the stream’s state with good()).

    The failbit will be set when getline() extracts no characters. This occurs for instance when you try to read another
    line while you are already at EOF.

    Therefore simply writing while(std::getline(f, line)) process(line); should process all lines, including a last line with a missing new line. Personally I’m not really interested in diagnosing these (I don’t see why I’d bother my users), I just want to process these lines as if the newline was there.

    I would check for badbit right after the loop to diagnose low-level errors.

    ifstream f("file");
    string line;
    while(getline(f, line))
    {
      process(line);
    }
    if (f.bad())
    {
      // report low-level error
    }
    f.close()
    
    1. Jan-Philip Gehrcke Avatar

      Thanks Alexendre for clarifying this.

      Can we rely on the failbit behavior you described on all OSs? The writeups on cplusplus.com are not good, they do not explain the case you described explicitly and they suggest using .good() in the while’s conditional statement. Furthermore, I must have missed something while evaluating the test results on a file like “foo\nbar”… I was quite sure that I always noticed the failbit together with eofbit. Now, after a quick test I can confirm the behavior you described.

      I will update the blog post when I find the time.

  14. Alexandre Duret-Lutz Avatar

    The problem with the

    while(getline(f, line).good()) process(&line);

    approach is that it silently ignores the last line of a text file that is missing the final newline.

    Many tools (e.g., g++, diff) emit a “missing new line at EOF” warning, or process the line as if the newline was present. I’d consider silently ignoring such a line as a bug.

    1. Jan-Philip Gehrcke Avatar

      Dear Alexandre,

      first of all, thank you for your very valuable comment. I can confirm this behavior. Good text editors always append a trailing new line. Users that create files programmatically know that they should append a trailing new line. But I agree with you — one must not rely on that.

      Now, what can we do? The problem, again: std::getline() extracts data from a stream up to the next delimiter, which is '\n' by default. If there is no trailing \n at the end of the file, getline() does not extract the data between the last '\n' and EOF. On my system, the behavior with respect to a valid file is as follows: assume file content "foo\n". First getline() extracts "foo" and does not set any error bit. Second getline() extracts nothing and sets failbit and eofbit. Now assume an invalid file content "foo\nbar". Exactly the same behavior (and "bar" is lost).

      We have to find a way to know if there was data between the last delimiter and EOF. Then, we could a) print a warning or b) try to get this data. In my test cases above, I have shown that, at least for my system, eofbit always also goes with failbit while reading a file using std::getline(). Hence, I see no chance to use the error bits of the stream to see if there was additional data after the last delimiter. Furthermore, if there was a way to detect the existence of residual data, how should we extract it after getline() has already invalidated the stream?

      The conclusion is, that when using std::getline(), there is no way to 1) detect and 2) extract residual data. Is that correct so far?

      In case the conclusion above is correct: we have to evaluate the ifstream on a character basis and implement our own getline() version in order to be able to detect and extract data between the last \n and EOF. Right?

      I hope to get a comment from you again :-)

      1. Corwin Joy Avatar
        Corwin Joy

        This is a gotcha that bit me and I think is worth mentioning in your article. Many times you see code like
        while(getline(f, line).good()) process(&line);
        This fails when there is no newline at the end of the file whereas
        while(getline(f, line)) process(&line);
        works fine.
        The reason why is subtle. The .good() method checks for (eofbit, failbit and badbit). So this will bail out and not process the last line because eofbit is set. In contrast, the built in operator checks only (failbit, badbit) so it will process the last line. See
        http://www.cplusplus.com/reference/ios/ios/good/ and
        http://www.cplusplus.com/reference/ios/ios/operator_bool/

  15. Jim Holsenback Avatar
    Jim Holsenback

    Excellent write up! Thanks for consolidating the discussions and references. As it turns out I was asked to learn about this process for a non POV-Ray project as well, so I’ve personally benefited from this as well.