Project 1 | CS 2113 Software Engineering - Spring 2021

Project 1

There are two parts to this project, with two different deadlines. In part A, you will implement a hashtable (hashmap) and linked list in C that is used in a spellcheck and boggle solver application. In part B, you will complete the same data structures and applications using Java.

Preliminaries

Test Script

To help you complete this project, each part is provided with a test script. The script is not designed to be comprehensive, and you will graded based on a larger array of tests. To execute the test script, run it from anywhere within the lab directory.

./test.sh

Note that when executing the test script using replit, due to using valgrind, it may take upwards of 5 minutes to complete! So you are better served developing some tests on your own rather than relying on the test script.

Compiling your code:

Part A

For part A (C implementation), we have provided you with a Makefile. You can compile the spellchecker or boggle solver by typing:

make spellcheck
make onePlayerBoggle

If you want to compile everything at once, simply type make. This will produce a number of additional .o files (or object files), which are compiled C files that are not yet assembled. Do not add these to your repository as they get overridden on every compilation.

To clean up your repository, you can use make clean command.

Part B

For part B (Java implementation), you should compile using javac.

javac SpellChecker
javac OnePlayerBoggle

You should not add your class files (.class) to the repository.

Development Environments

You can complete both parts using replit, if you so desire. For part A you will find that valgrind will run much faster using a local Ubuntu installation, either via WSL or in a virtual machine — this is our recommended development choice.

For part B, you may use replit, or you could develop your code in IntelliJ. One benefit of IntelliJ is that you can use its built in debugger, which is extremely helpful.


Data Structure Implementations

In both parts of the project, you will be implementing two basic data structures: a Linked List and a Hash Table (or Hashmap).

Linked List

The Linked List you will complete only requires forward pointers on its nodes, and only the push() operation, that is, put a new node on the front of the list. Each node in the list stores a string value. There is no need for the list to be generic.

Hashmap

The Hashmap data structure is simply a membership Hash Table — unlike a truly generic Hashmap that stores key, value pairs, this table returns true if an items is stored in the data structure and false otherwise. Put another way, it’s a Hashmap that maps a value to true. The Hashmap you will implement only needs to store strings. It has the following member functions:

The Hashmap should be implemented as a hash table with separate chaining. You may recall from your data structures class, this means that when two elements collide at an index, you add the item on to that spot, using say Linked List.

Following that model, your Hashmap should have an array (or buckets) of Linked Lists. After achieving the hash value for a given string (modulo the range of buckets), you push that string onto the Linked List at that index associated with the hash value. Critically, the performance of the Hashmap depends on the length of the lists of each bucket — if the lists get too long, then the look up operation could become O(n)!

The load on a hash table is defined as the number of items stored in the table divided by the number of buckets. High loads means longer lists at each bucket and worse performance. To keep performance steady, once the load reaches 0.75, you have to resize the hash table by doubling the number of buckets and reinserting all the items into their new hash locations. YOU MUST IMPLEMENT A RESIZE ROUTINE – YOU CANNOT SIMPLY SET YOUR NUMBER OF BUCKETS TO A LARGE VALUE!!


Part (A) (C Implementation) (70 Points)

Github

Data Structure Implementation

The crucial part of this project is the data structure implementations. In C, you typically divide your data structures between a header file (a .h file) and a source file (a .c file). The header file contains the structure and function definitions, while the source file contains their implementations. You will primarily work with the source files (.c).

Linked List

As you can see the llist.h, the Linked List defines two strucures:

//node type stored in lists
typedef struct ll_node{
  struct ll_node * next; //next node in list
  char * val; //string value stored in list
} ll_node_t;



//list_t struct to store a list
typedef struct{
  ll_node_t * head; //pointer to the node at the head of the list
  int size; //the number of nodes in the list
} llist_t;

The ``ll_node_t is a node within the linked list, storing the value (a char * string) and a pointer to the next node. The llist_t` is a structure representation of the list, storing a pointer to the head of the list and it’s current size (number of nodes.

There are three functions that operate over lists, described below. In llist.c you implement these methods.

// Return a newly initialized, empty linked list
llist_t * ll_init();

//delete/deallocate a linked list
void ll_delete(llist_t * ll);

//insert the string v (duplicated vis strdup!) onto the front of the list 
void ll_push(llist_t * ll, char * s);

Hash Map

The hashmap data strcucture is defined in hashmap.h and you will implemented in hashmap.c. The header file containing the structure and functions can be found below (with comments).

#define HM_INIT_NUM_BUCKETS 16
#define HM_MAX_LOAD 0.75

typedef struct{
  llist_t ** buckets; //array of `buckets` each pointing to a list_t (see list.h)
  int num_buckets; //how many buckets, or lenght of the bucket array (should always be a power of 2)
  int size; //how many items stored
} hashmap_t;


//initliaze a hashmap with INITIAL_BUCKETS number of buckets
hashmap_t * hm_init();

//delete/deallocate the hashmap
void hm_delete(hashmap_t * hm);

//add a string value to the hashmap
void hm_add(hashmap_t * hm, char * v);

//see if a string value is in the hashmap
bool hm_check(hashmap_t * hm, char * v);

In the C file you will implement a non-public (as in not in the header file) function _resize()

void _resize(hashmap_t * hm)

which is called when the load is greater than 0.75.

Spellchecker

To help test your Hashmap and Linked List implementation, we’ve provided a simple interactive spellchecker program that allows the user to type phrases (without punctuation) and it will spellcheck it. Here’s some sample inputs and outputs, along with the compilation.

$ make
gcc -Wall -Wno-unused-variable -g -c -o hashmap.o hashmap.c
gcc -Wall -Wno-unused-variable -g -c -o llist.o llist.c 
gcc -Wall -Wno-unused-variable -g -o spellcheck spellcheck.c hashmap.o llist.o -lreadline -lm
gcc -Wall -Wno-unused-variable -g -c -o boggle.o boggle.c
gcc -Wall -Wno-unused-variable -g -o onePlayerBoggle onePlayerBoggle.c boggle.o hashmap.o llist.o -lreadline -lm
$ ./spellcheck 
ERROR: require dictionary file
$ ./spellcheck dictionary.txt 
spellcheck > spellcheck all these words at once
SPELLCHECK -> not a word
ALL -> WORD
THESE -> WORD
WORDS -> WORD
AT -> WORD
ONCE -> WORD
spellcheck > or
OR -> WORD
spellcheck > one
ONE -> WORD
spellcheck > at
AT -> WORD
spellcheck > a
A -> WORD
spellcheck > time
TIME -> WORD
spellcheck > this adfasdfasdf is not a word
THIS -> WORD
ADFASDFASDF -> not a word
IS -> WORD
NOT -> WORD
A -> WORD
WORD -> WORD
spellcheck > nor !!! 
NOR -> WORD
!!! -> not a word
spellcheck > 
$ # type ^D to insert EOF to exit (or ^C)

Boggle Solver

Now that you’re Hash Map and Linked List are working, let’s use them to do something a bit more interesting — finding all the words on a boggle board!

The boggle game structure and functions are defined in boggle.h and you will do most of your work in boggle.c. A boggle instance is defined as a 5x5 grid of dice, where each dice displays a different character.

#define BOGGLE_DIMENSION 5

typedef struct {
  char board[BOGGLE_DIMENSION][BOGGLE_DIMENSION]; //the boggle board
  hashmap_t * dict; //dictionary mapping
} boggle_t;

When printed the board looks like

.-----------.
| S N T A Y |
| W N T E I |
| N QuI H I |
| N F O S U |
| E E H N L |
'-----------'

The goal is to find as many words (at least three letters long) by traversing from one dice to another in all directions (left, right, up, down, and diagonal) without using a dice more than once. So for example QUIT is a word found on the board, and so is QUITE. (You get a free ‘u’ for your ‘Q’.)

A number of functions are implemented and provided for you in boggle.c, your main work will be completing the bg_all_words() function, which will search the boggle board for all words 3 letters to 8 letters in length.

This is a recursive method that will explore outwards from a letter tile using depth first search. The idea is that you start a tile, like Qu and then try all neighbors (via a recursive call), outward, adding letters as you go and checking to see if you found a word. At somepoint you either search off the board or descended too far (checking a 9 letter word), and the recursion returns to explore another path. An algorithmic description is provided in a comment within boggle.c — see there for more details.

Once you complete, you can run the onePlayerBoggle program at a given random seed, like below:

aaviv@cs2113-vm:~/project-1a-inst$ ./onePlayerBoggle dictionary.txt 100
.-----------.
| R E E M G |
| N I E D T |
| O T O W A |
| K T S H I |
| C S I I A |
'-----------'
SHOW
WEER
RIOTED
AHA
SHOT
WHIST
NIT
DEW
TEE
NOTE
WHISK
DEER
WEED
TINE
MEETS
SHOE
SHOD
DEEM
RITE
ION
HOED
TIER
TOWED
SOTS
TEEN
TEEM
SODA
TIED
MEOWS
STEW
TEED
AWE
WETS
WADE
STEM
WHAT
MEW
SIT
SIS
WHIT
MET
WAD
SHOED
STONIER
NOT
STEER
HOSTS
HOST
TWOS
TONIER
STEED
TOTEM
WHO
HAWED
OHS
ONTO
SHOTS
TOTED
REIN
RENTED
TOWS
SWAT
NEED
HAWS
NOTED
ADO
ITS
OWED
MEOW
HAW
WOT
SOW
HAT
NITS
TIN
SOT
WEIR
RIOTS
TAD
MEWS
HIT
TONE
HIS
TIE
WOE
HAD
STOWS
TIRE
SOD
STOWED
TWEE
SOIRE
INTO
OWE
RIOT
SHOWED
TOED
SOWED
AWED
SITS
EERIE
DOTS
STIR
SHAT
MEET
HOSTED
WET
ONE
STONE
TWEED
SHIT
HITS
TOW
IRE
DOTE
TOTS
RENTS
WOST
TOT
RENT
SHAD
WEE
WED
TON
HOW
HOT
TOTE
TWO
HOS
TOE
DOT
STEIN
DOS
HOWS
SHITTIER
HISS
TONER
SHADOW
HOE
HOD
TOST
TOSS
DOE
WHITS
ITEM
SHITS
REED
ODE
SHADE
STOW

Total Points: 205

Note that the words are not alphabetical because hash tables are not ordered data structures.


Part (B) (Java Implementation) (30 Points)

In the second part of this project, you will implement your Hash Map and Linked List in Java using Object Oriented Principles. Here’s a quick guide to the source files found in this part. There are comments throughout and TODO marked where you should do your implementation. The same functions/methods on each of the objects as described in part A still apply, but now in Java.

Here’s some sample output of running the boggle solver with seed 100. Note that Java uses a different random number generator, so it is different than above.

$ java OnePlayerBoggle dictionary.txt 100
.-----------.
| O O D I O |
| Y E I A S |
| N I R P S |
| M R T B T |
| R D E F T |
'-----------'

NIT
PART
DEN
SAID
OSPREY
RETIRE
PARE
TIER
TIRED
DENIM
SAP
BERM
SAD
ASPS
PARRED
SPIRITED
DISS
SPARRED
SOAR
SOAP
IRED
DIRT
SPA
BRINY
DART
DIRE
BRIM
FER
APTER
DARE
PIE
ASPIRIN
TIRE
DRIER
FED
SIDE
SPAS
SPAR
PRIDE
DIPS
ERR
SPRITE
FERN
FTE
AIR
ASPIRE
AID
SPIDER
DOE
ASIDE
ODE
DEFT
ARID
SARI
SPADE
TINY
RIP
RIDE
RIM
TSAR
TRIAD
RID
RITE
PAIR
SAPS
TRIPS
PAID
YEN
OAR
SPIRE
SPARE
DISPIRIT
SOAPIER
AIDE
PRIM
ERRED
REIN
ADIS
ADO
PAS
PAR
RIPS
TIN
DOER
REF
PAD
RED
BET
TIE
DIS
TERN
TERM
DIP
DAIS
PIER
NITER
ASS
BED
DIE
DARTED
ASP
DOYEN
ART
IRE
BRIDE
PREY
ARE
PARTED
SOAPS
TRIP
TRIO
TRIM
PAST
PASS
SPIRIT
DENY
APT
BRR
Total Points: 185

Bonus (part B) (up to +25 points)

Create a new branch in your repository called optimized and work within that branch — do not make these changes on your main branch otherwise it may affect your grading of part B. Once complete push this branch to the github and also open a issue with the title “BONUS Submission Optimized”

Modify your HMap and LList implementations (or implement/use other data structures from Java stdlib) in part B, as well as the boggle routines such that you optimize performance as best you can. The top 5 fastest boggle solvers in the class will win recognition and bonus points:

Some ideas/hints for optimizing your performance:

You can test your speed of your java solver by running it with

time java OnePlayerBoggle dictionary.txt 

and look at the real time output.

Bonus (part B) (10 points)

Create a new branch in your repository called ordered and work within that branch — do not make these changes on your main branch otherwise it may affect your grading of part B. Once complete push this branch to the github and also open a issue with the title “BONUS Submission Ordered”

Use Java standard library to replace/modify/etc your Hash Map and Linked List implementations with different data structures provided there such that the output of the words from the OnePlayerBoggle are in sorted order.