Assignment #7
This assignment requires to use several components of the Standard Library
to implement a spell-checker. The amount of code that you will write is
not large.
Specifications
Prompt the user for the name of a file that stores a dictionary of words.
Then prompt for the name of a file that you want to spell-check. Any word
that is not in the dictionary is considered to be misspelled. Output, in
sorted order, each misspelled word and the line number(s) on which it occurs.If
a word is misspelled more than once, it is listed once, but with several
line numbers. Of course you should verify that files open correctly.
What's A Word?
For the purposes of this assignment, you will determine words as follows:
The input is considered to be a sequence of tokens separated by
whitespace. Any token that ends with a single period, question mark, comma,
semicolon, or colon should have the punctuation removed. After doing this,
any token that contains letters only is considered a word. Convert this
word to lower case.
Example: For the following line
This is a test, one-half of four is 2.
The tokens are:
This
is
a
test,
one-half
of
four
is
2.
Among these, the words are:
this
is
a
test
of
four
is
one-half fails the rule of consisting entirely of letters,
as does 2. This is converted to lower case and test
has the punctuation at the end stripped.
These are the rules, even if I've missed a few cases (like apostrophes,
etc.).
The Dictionary
The dictionary contains one word per line. A large dictionary (~800Kbytes)
is available (in may take a little time to download). This dictionary was
obtained from the Internet and may have inappropriate words. I apologize
in advance if this is the case.
The Algorithm
Read the dictionary file and store its contents in a set<string>.
Then read the data input file, one line at a time. Break the line into tokens
using an istringstream object, and then write some functions to
convert the tokens to words (or an empty string if it is not a word). Once
you have a word, check to see if it is the set<string> that
stores the dictionary. If it is not, you will need to add it to a map<string,list<int>
> that stores the misspelled words and the line numbers on which they
occur. (This implies that you know the current line number.) Once everything
is read, you need to step through the map and print its contents in an
orderly way.
Header Files
You'll need
#include <iostream>
#include <fstream>
#include <sstream>
#include <set>
#include <map>
#include <list>
#include <cctype> // contains isalpha( ), to check
for letters
using namespace std;
What to Submit
Submit your complete source code and the results of running on the data file
ch3.txt.