Because of this POV-Ray issue, I wanted to figure out what is the most secure and stable way to read a file line by line in C++ using a std::ifstream in combination with std::getline(). Furthermore, it was the aim to provide as precise error messages as possible. It turned out that dealing with the error bits eofbit, failbit, and badbit is already challenging, as discussed for e.g. here, here, and here, and finally at cplusplus.com. Even in the latter case you are not provided with all details and the optimal solution. Regarding error messages, it is getting more complicated. Evaluating errno, respectively perror(), at the same time as the error bits is not trivial as can already be inferred from discussions like this and this. But after some testing and carrying together all available information it turns out that there are good recipes to follow.
Update (July, 7th, 2011): I revised the whole article due to an important insight provided by Alexandre Duret-Lutz (confer comments).
If you just want to use/see the results of the investigation, scroll down to the ideal solutions. Otherwise, you should make yourself briefly familiar with eofbit, failbit, badbit of the ios class.
All the code shown in this post can be downloaded.
What we want and have to do: obeying the two rules of ifstream iteration (ideal solution snippets)
We want to iteratively process the lines read from a file by means of an ifstream (reasons here). Therefore, we try to open a file by invoking ifstream s ("file"). To try to get a line from the file, we use std::getline(s, line), where line is a std::string to store the data to. In a loop we want to process the data read from the file, line by line, via process(line). Of course, we only want to call process(line), if the last getline() stored meaningful data in line. We also want to get the last line of the file, even if it is not terminated by a newline character.
After the investigation described below, I am pretty sure that the simplest rock-solid language construct for this task is:
string line; ifstream f ("file"); while(getline(f, line)) { process(&line); }
This is so simple and at the same time good, because it is the shortest approach following the two basic rules we must follow when using std::getline() (or any other IO operation):
- Before processing data obtained from the stream, check for errors reported by
getline()(or other IO operations).
- If
getline()(or …) has setfailbitorbadbit, do not process the data.eofbitis not required to be checked in the loop and does not necessarily have to prevent data processing.
The origin of these rules will become clearer while reading the rest of the article.
Only two rules, to follow — isn’t that easy? Anyway, this often is not done, as you can infer from the links in the introduction. In fact, not following these rules lead to the bug in POV-Ray also mentioned in the introduction.
How does the simple code snippet above follow these rules? The loop, in fact, at first tries to obtain data from the stream via IO operation getline(). It is totally okay to try this even on a bad/empty/non-existing file, because it just tries and afterwards sets the stream’s error bits correctly, as defined here. After getline(), failbit and badbit are checked via the ifstream’s bool operator (this works because getline() returns the stream object). Only if none of these bits are set you can be sure that there is data in line. In this case the loop body is evaluated. It processes the data obtained from the stream. Then, it is tried to read the next line… and so on. The point is that for each iteration the chronological order of IO operation, error check, and data processing is correct. If you wonder why we do not have to check the eofbit within the loop: this will be answered during the discussion below.
Now, what happens if the file is empty? Or if it does not even exist? If it is a directory? If we are not allowed to access it? The code snippet above can deal with all types of errors transparently. That is very good so far. But what to do if we want to additionally get some meaningful error messages? This is the best we can do:
string line; ifstream f ("file"); if (!f.is_open()) perror("error while opening file"); while(getline(f, line)) { process(&line); } if (f.bad()) perror("error while reading file");
Why? Discussed in the next part.
How to catch errors specifically? Testing ifstream’s behavior
Let me start with
Two important things to know:
- Consider a call to
std::getline()detecting the end of file. It then setseofbit. But: “Notice that some eofbit cases will also set failbit.” (reference). This will be very important and we will figure out in which cases exactly we have either onlyeofbitor both,eofbitandfailbitset.
perror()evaluates the current setting oferrnoand prints a meaningful error message.errnois a global error variable which is set by low-level functions of your current operating system. Thiserrnosetting is sticky: it stays until the next error is happening, overwriting the state of the last error. Therefore,perror()must only be called in a context that for sure has updatederrnoright before. Otherwise, the printed error message may not make any sense at all in the current context.
As you already can imagine, for providing meaningful error messages, it is required to understand when exactly the eof, fail and badbit are set. Also, one has to know when exactly it is safe to call perror() in the context of stream methods. Unfortunately, at this point we enter system-dependecy and the documentations are bad. Hence, I implemented a test to see how my test system behaves, which is a 2.6.27 Linux. I will provide all source files and it will be very easy for you to reproduce this test on your system.
The test:
The test can be summarized as follows:
Consider ifstream s ("file"). It checks the state of
s.is_open()s.fail()(same as!s: check forfailbitandbadbit)s.bad()(check for onlybadbit)s.eof()(check for onlyeofbit)errno(evaluated viaperror())
while opening/reading
- a non-existent file
- an empty file
- an existing file with content
- an existing file with the last line not ended by a newline character (can be considered as invalid file format, since lines are mostly considered to be newline-terminated, not newline-separated).
- a file with content that is opened by another process for reading
- a file with content that is opened by another process for writing
- a file that the test program has no access to
- a directory
Basically, the test evaluates the named quantities at all interesting points and especially after calls to std::getline().
Technically, the test consists of:
- The C++ source of a test program with a lot of debug output expecting an input filename as first commandline argument.
- A shellscript that is compiling the C++ source code of the test program and setting up the test files for the test. It then runs the test program for all different input filenames.
This is the source of the shellscript readfile_tests.sh:
#!/bin/bash COMPILATION_SOURCE=$1 NE_FILE="na" EMPTY_FILE="empty_file" ONE_LINE_FILE="one_line_file" INVALID_LINE_FILE="invalid_line_file" FILE_READ="file_read" FILE_WRITTEN="file_written" FILE_DENIED="/root/.bashrc" DIR="dir" # compile test program, resulting in a.out executable g++ $COMPILATION_SOURCE # create test files / directories and put them in the desired state touch $EMPTY_FILE if [[ ! -d $DIR ]]; then mkdir $DIR fi echo "rofl" > $ONE_LINE_FILE echo -ne "validline\ninvalidline" > $INVALID_LINE_FILE echo "i am opened to read from" > $FILE_READ python -c 'import time; f = open("'$FILE_READ'"); time.sleep(4)' & echo "i am opened to write to" > $FILE_WRITTEN python -c 'import time; f = open("'$FILE_WRITTEN'", "a"); time.sleep(4)' & # execute test cases echo "******** testing on non-existent file.." ./a.out $NE_FILE echo echo "******** testing on empty file.." ./a.out $EMPTY_FILE echo echo "******** testing on valid file with one line content" ./a.out $ONE_LINE_FILE echo echo "******** testing on a file with one valid and one invalid line" ./a.out $INVALID_LINE_FILE echo echo "******** testing on a file that is read by another process" ./a.out $FILE_READ echo echo "******** testing on a file that is written to by another process" ./a.out $FILE_WRITTEN echo echo "******** testing on a /root/.bashrc (access should be denied)" ./a.out $FILE_DENIED echo echo "******** testing on a directory" ./a.out $DIR
This is the source of the C++ program readfile_debug.cpp:
#include <iostream> #include <fstream> #include <string> using namespace std; int check_error_bits(ifstream* f) { int stop = 0; if (f->eof()) { perror("stream eofbit. error state"); // EOF after std::getline() is not the criterion to stop processing // data: In case there is data between the last delimiter and EOF, // getline() extracts it and sets the eofbit. stop = 0; } if (f->fail()) { perror("stream failbit (or badbit). error state"); stop = 1; } if (f->bad()) { perror("stream badbit. error state"); stop = 1; } return stop; } int main(int argc, char* argv[]) { string line; int getlinecount = 1; if(argc != 2) { cerr << "provide one argument" << endl; return 1; } cout << "* trying to open and read: " << argv[1] << endl; ifstream f (argv[1]); perror("error state after ifstream constructor"); if (!f.is_open()) perror("is_open() returned false. error state"); else cout << "is_open() returned true." << endl; cout << "* checking error bits once before first getline" << endl; check_error_bits(&f); while(1) { cout << "* perform getline() # " << getlinecount << endl; getline(f, line); cout << "* checking error bits after getline" << endl; if (check_error_bits(&f)) { cout << "* skip operation on data, break loop" << endl; break; } // This is the actual operation on the data obtained and we want to // protect it from errors during the last IO operation on the stream cout << "data line " << getlinecount << ": " << line << endl; getlinecount++; } f.close(); return 0; }
Let’s run it:
$ ./readfile_tests.sh readfile_debug.cpp
The output:
******** testing on non-existent file.. * trying to open and read: na error state after ifstream constructor: No such file or directory is_open() returned false. error state: No such file or directory * checking error bits once before first getline stream failbit (or badbit). error state: No such file or directory * perform getline() # 1 * checking error bits after getline stream failbit (or badbit). error state: No such file or directory * skip operation on data, break loop ******** testing on empty file.. * trying to open and read: empty_file error state after ifstream constructor: Success is_open() returned true. * checking error bits once before first getline * perform getline() # 1 * checking error bits after getline stream eofbit. error state: Success stream failbit (or badbit). error state: Success * skip operation on data, break loop ******** testing on valid file with one line content * trying to open and read: one_line_file error state after ifstream constructor: Success is_open() returned true. * checking error bits once before first getline * perform getline() # 1 * checking error bits after getline data line 1: rofl * perform getline() # 2 * checking error bits after getline stream eofbit. error state: Success stream failbit (or badbit). error state: Success * skip operation on data, break loop ******** testing on a file with one valid and one invalid line * trying to open and read: invalid_line_file error state after ifstream constructor: Success is_open() returned true. * checking error bits once before first getline * perform getline() # 1 * checking error bits after getline data line 1: validline * perform getline() # 2 * checking error bits after getline stream eofbit. error state: Success data line 2: invalidline * perform getline() # 3 * checking error bits after getline stream eofbit. error state: Success stream failbit (or badbit). error state: Success * skip operation on data, break loop ******** testing on a file that is read by another process * trying to open and read: file_read error state after ifstream constructor: Success is_open() returned true. * checking error bits once before first getline * perform getline() # 1 * checking error bits after getline data line 1: i am opened to read from * perform getline() # 2 * checking error bits after getline stream eofbit. error state: Success stream failbit (or badbit). error state: Success * skip operation on data, break loop ******** testing on a file that is written to by another process * trying to open and read: file_written error state after ifstream constructor: Success is_open() returned true. * checking error bits once before first getline * perform getline() # 1 * checking error bits after getline data line 1: i am opened to write to * perform getline() # 2 * checking error bits after getline stream eofbit. error state: Success stream failbit (or badbit). error state: Success * skip operation on data, break loop ******** testing on a /root/.bashrc (access should be denied) * trying to open and read: /root/.bashrc error state after ifstream constructor: Permission denied is_open() returned false. error state: Permission denied * checking error bits once before first getline stream failbit (or badbit). error state: Permission denied * perform getline() # 1 * checking error bits after getline stream failbit (or badbit). error state: Permission denied * skip operation on data, break loop ******** testing on a directory * trying to open and read: dir error state after ifstream constructor: Success is_open() returned true. * checking error bits once before first getline * perform getline() # 1 * checking error bits after getline stream failbit (or badbit). error state: Is a directory stream badbit. error state: Is a directory * skip operation on data, break loop
The test results (important things to know 2):
There are many things to learn from this output. The following conclusions are only a subset. All this makes makes a lot of sense:
- The
ifstream s ("file")constructor setserrnoin case of a non existing file.
is_open()does not seterrno.
is_open()does not catch the case when trying to open a directory.
is_open()only catches the non-existing-file-case.
-> is_open() after ifstream constructor in combination with perror() works, but only for catching this one case: non existing file.
getline()does only changeerrnoin case of trying to read from a directory. In all other test cases it does not changeerrno.
- In almost any case, the
eofbithast been set at the same time as thefailbit(which may happen as stated above). A closer look reveals that thefailbitis only set bygetline()if it did not manage to extract any data at all. Theeofbiton the other hand means thatgetline()reached EOF while searching the next line delimiter: If there is data between the last delimiter and EOF,getline()extracts this data and setseofbit.
- The
badbitwas only set for the directory case.
-> !s or s.fail() (equivalent, both are sensitive to badbit and failbit), must not be used in combination with perror(), because it may print error messages which are wrong in the current context.
-> Only a badbit check and therefore s.bad() seems to be safe to use together with perror().
-> In order to process residual data between the last line delimiter and EOF, a positive eofbit must not prevent data processing.
Ideal solutions
With the knowledge from above, ideal code solutions in form of ready-to-compile-examples can be proposed for two cases:
- one, in which error messages are not important
- one, in which we do the best we can to extract error messages
Ideal solution including meaningful error messages
Basically, the result from the above’s test is that with C++’s standard means there is not so much we can do to catch specific errors. The following source of readfile_stable_errors.cpp seems to be the best:
#include <iostream> #include <fstream> #include <string> using namespace std; void process(string* line) { cout << "line read: " << *line << endl; } int main(int argc, char* argv[]) { string line; if(argc != 2) { cerr << "provide one argument" << endl; return 1; } string filename(argv[1]); cout << "* trying to open and read: " << filename << endl; ifstream f (argv[1]); // After this opening attempt, we can safely use perror() in case // f.is_open() returns false. if (!f.is_open()) perror(("error while opening file " + filename).c_str()); // Now read the file via std::getline, while trying to provide as // meaningful error messages as possible. Rules obeyed: // - first the IO operation, then error check, then data processing // - failbit and badbit prevent data processing, eofbit does not while(getline(f, line)) { process(&line); } // Only in case of the badbit, we can assume that some lower // level system API function has set errno. Then, perror() can be used. if (f.bad()) perror(("error while reading file " + filename).c_str()); f.close(); return 0; }
Of course this can be run against the test shellscript from above:
./readfile_tests.sh readfile_stable_errors.cppThe output:
******** testing on non-existent file.. * trying to open and read: na error while opening file na: No such file or directory ******** testing on empty file.. * trying to open and read: empty_file ******** testing on valid file with one line content * trying to open and read: one_line_file line read: rofl ******** testing on a file with one valid and one invalid line * trying to open and read: invalid_line_file line read: validline line read: invalidline ******** testing on a file that is read by another process * trying to open and read: file_read line read: i am opened to read from ******** testing on a file that is written to by another process * trying to open and read: file_written line read: i am opened to write to ******** testing on a /root/.bashrc (access should be denied) * trying to open and read: /root/.bashrc error while opening file /root/.bashrc: Permission denied ******** testing on a directory * trying to open and read: dir error while reading file dir: Is a directory
Congratulations, “no such file or directory”, “is a directory”, and “permission denied” are catched. Also, the data in the “invalid” line was read.
Ideal solution without printing error messages:
The following source of readfile_stable_no_errors.cpp deals with all errors transparently and extracts residual data from an “invalid” last line:
#include <iostream> #include <fstream> #include <string> using namespace std; void process(string* line) { cout << "line read: " << *line << endl; } int main(int argc, char* argv[]) { string line; if(argc != 2) { cerr << "provide one argument" << endl; return 1; } cout << "* trying to open and read: " << argv[1] << endl; ifstream f (argv[1]); // Note that we can omit checking for f.is_open(), because // all errors will be catched correctly by f.fail() (!f) and // we do not want to print error messages here. // Also note that during the loop, the following rules are obeyed: // - first the IO operation, then error check, then data processing // - failbit and badbit prevent data processing, eofbit does not while(getline(f, line)) { process(&line); } f.close(); return 0; }
The test:
./readfile_tests.sh readfile_stable_no_errors.cppOutput:
******** testing on non-existent file.. * trying to open and read: na ******** testing on empty file.. * trying to open and read: empty_file ******** testing on valid file with one line content * trying to open and read: one_line_file line read: rofl ******** testing on a file with one valid and one invalid line * trying to open and read: invalid_line_file line read: validline line read: invalidline ******** testing on a file that is read by another process * trying to open and read: file_read line read: i am opened to read from ******** testing on a file that is written to by another process * trying to open and read: file_written line read: i am opened to write to ******** testing on a /root/.bashrc (access should be denied) * trying to open and read: /root/.bashrc ******** testing on a directory * trying to open and read: dir
The intention of this program is to never fail due to stream opening/IO errors. This succeeds: whenever there is data to extract, it extracts data. All other cases are transparently handled correctly.
Remember, all code shown here can be downloaded.
I hope that this article helps someone. Please let me know if I have to correct some things and if we can do better (thanks again to Alexandre at this point).
Excellent write up! Thanks for consolidating the discussions and references. As it turns out I was asked to learn about this process for a non POV-Ray project as well, so I’ve personally benefited from this as well.
The problem with the
approach is that it silently ignores the last line of a text file that is missing the final newline.
Many tools (e.g., g++, diff) emit a “missing new line at EOF” warning, or process the line as if the newline was present. I’d consider silently ignoring such a line as a bug.
Dear Alexandre,
first of all, thank you for your very valuable comment. I can confirm this behavior. Good text editors always append a trailing new line. Users that create files programmatically know that they should append a trailing new line. But I agree with you — one must not rely on that.
Now, what can we do? The problem, again:
std::getline()extracts data from a stream up to the next delimiter, which is'\n'by default. If there is no trailing\nat the end of the file,getline()does not extract the data between the last'\n'andEOF. On my system, the behavior with respect to a valid file is as follows: assume file content"foo\n". Firstgetline()extracts"foo"and does not set any error bit. Secondgetline()extracts nothing and setsfailbitandeofbit. Now assume an invalid file content"foo\nbar". Exactly the same behavior (and"bar"is lost).We have to find a way to know if there was data between the last delimiter and
EOF. Then, we could a) print a warning or b) try to get this data. In my test cases above, I have shown that, at least for my system,eofbitalways also goes withfailbitwhile reading a file usingstd::getline(). Hence, I see no chance to use the error bits of the stream to see if there was additional data after the last delimiter. Furthermore, if there was a way to detect the existence of residual data, how should we extract it aftergetline()has already invalidated the stream?The conclusion is, that when using
std::getline(), there is no way to 1) detect and 2) extract residual data. Is that correct so far?In case the conclusion above is correct: we have to evaluate the
ifstreamon a character basis and implement our owngetline()version in order to be able to detect and extract data between the last\nandEOF. Right?I hope to get a comment from you again
std::getline(f, line)extracts data up to either ‘\n’, or EOF, or failure. So it will read a last line missing a new line. In this case this will set the eofbit, but not failbit: this is not a failure. If you do not want to ignore that line, do not consider eofbit to be a failure (i.e., do not check the stream’s state withgood()).The failbit will be set when
getline()extracts no characters. This occurs for instance when you try to read anotherline while you are already at EOF.
Therefore simply writing
while(std::getline(f, line)) process(line);should process all lines, including a last line with a missing new line. Personally I’m not really interested in diagnosing these (I don’t see why I’d bother my users), I just want to process these lines as if the newline was there.I would check for badbit right after the loop to diagnose low-level errors.
ifstream f("file"); string line; while(getline(f, line)) { process(line); } if (f.bad()) { // report low-level error } f.close()Thanks Alexendre for clarifying this.
Can we rely on the failbit behavior you described on all OSs? The writeups on cplusplus.com are not good, they do not explain the case you described explicitly and they suggest using .good() in the while’s conditional statement. Furthermore, I must have missed something while evaluating the test results on a file like “foo\nbar”… I was quite sure that I always noticed the failbit together with eofbit. Now, after a quick test I can confirm the behavior you described.
I will update the blog post when I find the time.
Done
Thanks so much for the comprehensive guidance. It saved me from losing confidence with the iosteam this afternoon.
Excellent summary with links!
I put the URL of your page into a comment in my code. ☺
Don’t know about Linux but on Windows getline() (or to be more precise some other internal function that is called by getline()) sets failbit (rather than badbit) on any read error reported by Win32 API ReadFile() call (based on MSVC 2010 C/C++ RTL source code). So here is how to check for errors in ALL cases (i.e. not only when badbit flag is set). Theoretically should work equally well on Linux too since it relys only on what C/C++ language standard and library documentation says.
errno = 0;
string line;
ifstream f(“file”);
while(getline(f, line))
process(&line);
if(errno)
perror(“Encountered the following error”);
The code above takes care of both stream opening and reading errors.
I verified that your code works on Linux, after including
<cerrno>.As I understand, you are saying that
f.bad()is never true aftergetline()on Windows. Do I get you correctly? Did you test that?getline() sets badbit if it catches any exception during its execution. No such exception is thrown if ReadFile() fails (for any reason). I primarily checked what happens when ReadFile() fails but have not studied all other possible code paths and atm cannot guarantee that badbit will never happen (I might study the sources in more detail later). And yes, I tested it by locking (in another program) a region of bytes in the file that getline() was reading. Also, if you try to open a directory or a non-existant file on Windows you get only failbit (out of three possible bits) right after the f.open() call.
So just to be safe it might be a good idea to check for badbit also:
if(errno)
perror(“Encountered the following error”);
else if(f.bad())
cerr << "Encountered unknown error";
Thank you for this write up. Extremely helpful, and Very easy to follow and understand.