C Basics | CS 2113 Software Engineering - Spring 2021

C Basics and I/O #

C Programming and Unix #

C is an incredibly important programming language, particularly for systems level programming that’s because C is a low level language, much closer to the hardware and Operating System then say Java, which we’ll discuss in more detail later in the semester. In fact, almost all modern operating systems are written in C including Unix, Linux, Mac OSX, and even Windows.

The C programming language, itself, was developed for the purpose of writing the original Unix operating system (the precursor to Linux and OSX). It shouldn’t be that surprising then, that if you want to learn how to write programs that interact with the Unix system directly, then those programs must be written in C. And, in so doing, the act of learning to program in C will illuminate key parts of the Unix system.

In many ways, you can view C as the lingua franc of programming; it’s the language that every competent programmer should be able to program, even a little bit in. The syntax and programming constructs of C are present in all modern programming languages, and the concepts behind managing memory, data structures, and the like, underpin modern programming. Humorously, while nearly all programmers know how to program in C, most try to avoid doing so because the same power that C provides as a “low level” language is what makes it finicky and difficult to deal with.

But it’s really, really, really important for you to know C because the programming concepts are so fundamental to how you’ll think about programming going forward, and if you desire to be a professional software developer or engineer —- you’ve got to know C! More so, you’ll need it in your later class ;-)

C Programming Preliminaries #

First: YOU kinda ALREADY KNOW C! That’s because you’ve been programming in Java and the Java syntax and C syntax are super similar. That’s because Java “stole” C style syntax. But Java is a more advanced program language (in more than just the object oriented sense), so there are more than a few things that Java does that C does not; however many are the same. For example:

The big differences between C and Java:

While clearly, the two programming languages, Java and C, are different, they are actually have a lot of similarities. But, as you’ll see later, when we transition to Java, the object oriented model is a true advancement on C. I hope, though, you’ll see the bones of C within the Java object model.

Hello World #

When learning any programming language, you always start with the “Hello World” program. The way in which your program “speaks” says a lot about the syntax and structure of programming in that language. Below is the “Hello World” program for Java and C, for comparison.

/*HelloWorld.java*/
public class HelloWorld{
    public static void main(String args[]){
        System.out.println("Hello World");
    }
}
/*helloworld.c*/
#include <stdio.h>

// Hello World in C
int main(int argc, char * argv[]){
  printf("Hello World\n");
  return 0;
}

While in Java (since everything is an Object), the main function is wrapped within the HelloWorld class. But focusing on the main functions function declarations, they both take in a list of arguments. The C main function, though, returns an integer. This is the exit value of the program. On success, a C program returns 0. On error, it returns anything other than 0.

Additionally, in C you have a compiler directive #include which functions much like the import statement in Java. In the Java program, we get access to System and its member stream out implicitly (although, we could import java.util.System explicitly). In this case the #include <stdio.h> tells the compiler to include the Standard Input and Output library, which provides the programmer access to the printf function for printing “Hello World”. The .h in stdio.h refers to a header file, which functions like a package in Java to group function and data declarations together for ease of use.

Compiling a C program #

The compilation process for C is very similar to that of Java as both are compiled programs. The gold-standard C compiler is gcc, the gnu C compiler. For example, to compile helloworld.c, we do the following in our shell.

#> gcc helloworld.c

Which will produce an executable, binary file called a.out (by tradition), which we can run by including a ./ in front.

#> ./a.out
Hello World

(Aside … The reason you need to include ./ is to tell the shell not to look up the program named a.out on the path, but rather run the one here in this directory. The path is where the shell looks for programs to run, like ls or cat—which are also written in c—and are typically stored in /bin directory. You can type which cat or which ls to see where they exist on the path, if your interested. You can also echo $PATH in your terminal to see where binary files are looked up.)

Having a program named a.out is not very specific, so instead, you can specify an output file using the -o option.

#> gcc helloworld.c -o helloworld
#> ./helloworld
Hello World

There are more advanced compilation techniques that we will cover late, such as including multiple files, compiling to object files, and using pre-compiler directives.

Note that on repl.it and other engines, the clang compiler is use. This operates just like gcc and on some systems is even symlinked to clang (that means they are the same thing!). While I’ll use gcc in the demos, when you run your code on repl.it, it will actually run clang.

No JVM #

Note that unlike Java, we can run the compiled program directly. That is, you don’t have to do something like

#> javac HelloWorld.java
#> java HelloWorld

That’s because there is no JVM. The output of the compiler is a binary, executable program that runs directly on the CPU without an intermediary. This means that C runs much, much faster than Java.

This also means that for every new computer with a different architecture, you have to recompile your code. In Java, you can compile your file to a class, in bytecode, and ship that out. Any JVM can run it.

But the JVM itself, is written in C, so it needs to be compiled for each computer—only once! And that’s the power of JVM. Interpreted languages like Python also use this model.

Includes #

The process of including libraries in your program has a lot of similarities to that Java imports. Note that all C libraries end in .h, unlike C++. Here are some common libraries you will may want to include in your C program:

When you put a #include <header.h> in your program, the compiler will search for that header in its header search path. The most common location is in /usr/include. However, if you place your header file in quotes:

#include "header.h"

The compiler will look in the local directory for the file and not the search path. This will become important when you develop larger, multi-file programs.

Where is the C documentation? The C “standard” and the man pages! #

From working with Java, you are probably familiar with oracle Java SE documentation (that’s the “Standard Edition”). For example, the latest vesion of the Java docs can be found here.

https://docs.oracle.com/en/java/javase/15/docs/api/index.html

While there are many places where the functionality of C is written down, unlike Java, it’s not “owned” by a single corporation so there are no unified documentation as it may be system or compiler dependent. There are instead standards for how a C program should be behave, so called defined behavior.

The C standards are defined by the year in which they were defined. For example, the latest standard is C18 (codified in 2018). The biggest advance in C is the C99 standard (codified in 1999) and is the primary version we will use in this class. (Prior to C99 the standard was called ANSI C, which you may of heard of.)

The document for C and its standard libraries are encoded into unix systems through the manual pages (or “man pages”). In particular, there are in section 3 of the man pages. On any unix system, when the man pages are installed, you can type man followed by the function in question, like

man 3 fopen

And it will open a manual for you to read (type q to quit). All the man page s are also available online for free (of course!). The website at die.net. I’d keep that page open and handy for all your needs.

Control Flow #

The same control flow you find in Java is present in C. This includes if/else statements.

if( condition1 ){
  //do something if condition1 is true
}else if (condition2){
  //do something if condition1 is false and condition2 is true
}else{
  //do this if both condition1 and condition2 is true
}

While loops:

while( condition ){
  //run this until the condition is not true
}

And, for loops:

//run init at the start
for ( init; condition; iteration){
  //run until condition is false preforming iteration on each loop.
}

In previous versions of C, you were not able to declare new variables within the for loop. However you can now do the following without error:

for(int i=0; i < 10; i++){
  printf("%d\n",i); //i scoped within this loop
}

In this format above, the declaration of the variable i exists within the scoping of the loop. Referring to i outside of the loop is an error, and if you declared a different i outside the loop, actions within the loop would not affect the outer declaration. For example:

int i=3;
for(int i=0;i<100;i++){
   printf("%d\n",i); //prints 0 -> 99
}
printf("%d\n",i); //prints 3 --- different i, different scope!

Essentially, it works the same as Java scoping. However, as old habits die hard, throughout these notes, the old C style may come up from time to time. Sorry.

As a final note, C also has case-switch statements, like Java, which is handy when dealing with large if/else blocks. Note that it only works for data types where equality works, that is the == operator.

switch(i){
    case 0 :
        //do something if i==0
        break; //<-- must have a break otherwise next case runs!
    case 1 :
        //do something if i==1
        break; //<-- must have a break otherwise next case runs!
    case 2 :
        //do something if i==2
        break; //<-- must have a break otherwise next case runs!
    //...
    default: //optional
        //do something if nothing else applies
        //don't need a break here because you fall out of the case statement
}

What is true and false in C #

Everything but 0 is true!

C does not have a boolean type, that is, a basic type that explicitly defines true and false. Instead, true and false are defined for each type where 0 or NULL is always false and everything else is true. All basic types can be used as a condition on its own. For example, this is a common form of writing an infinite loop:

while(1){
  //loop forever!
}

That means that the output of boolean operations returns an integer value, and you can write code like the below.

int a = (1 > 0);
if ( a == 1){
    print("The comparison was true!\n");
}else{
    printf("The comparison was false!\n");
}

The evaluation of the expression (1 > 0) returns a 1 as it is true, and thus, we compare can save that output to a variable, a.

Quick aside about NULL it’s just 0 of a different type #

NULL is 0 and 0 is NULL and NULL is logical false. You can use all of them interchangeable.

if ( 0 == NULL ){
  printf("They are the same?!?\n")
}

However, there are sort of, not quite exactly the same. The constant value 0 is of type int (or sometimes long or long long depending on context), while the constant value of NULL is of type void *. That is, NULL is used when zero-ing out a pointer value while 0 is used for zeroing out a number value. They are both actually zero, though.

Adding bools to C use #

While many C programmers (in which I include myself) prefer to use integer values for true/and false, there are others prefer to use the reserve words of true and false to refer to logical true and false. If you’re one of those programmers, there is an easy way to add boolean constants to the language using compiler directives.

Consider the following program:

#include <stdio.h>

#define true 1
#define false 0

int main(){

  if(true){
    printf("Look it's true!\n");
  }

  if(!false){
    printf("Look it's *not* false!\n");
  }
}

The #define tells the compiler to replace the word true with 1 and false with 0 anywhere it appears in the code. That means when your program, you get to write true and false, but when it compiles, it produces the code.

#include <stdio.h>

int main(){

  if(1){
    printf("Look it's true!\n");
  }

  if(!0){
    printf("Look it's *not* false!\n");
  }
}

Rather than have to add the compiler directives to programs throughout, there’s a library header stdbool.h that you can include to give you a the functionality of bool types.

#include <stdio.h>
#include <stdbool.h>

int main(){

  bool b1 = 1 > 2; //stores 0
  bool b2 = 1 < 2; //stores 1

  if(b1 || b2){
    printf("b1 OR b2 is true.\n");
  }

  if(b1 && b2){
    printf("b1 and b2 is true\n");
  }

  if(b1 == false){
    printf("b1 is false\n");
  }

  if(b2 == true){
    printf("b2 is true\n");
  }
}

But the secret is that bool is really just an integer value of 1 or 0. Feel free to use stdbool.h if you want, but I will likely not use it in these notes.

(Quick aside: I wasn’t entirely truthful above … in fact bool is just a short hand for data type of _Bool which is a special data type in the c99 standard that can _only_ be 0 or 1 and is stored in a 1-byte wide unsigned integer.)

Format Input and Output #

printf() and scanf() #

The way output is performed in Java in C also has parallels. In Java you can use a scanner to read from a input stream, and in C you can also do the same.

One big difference is that C has a notion of formatted output. You may have noticed that in Java you can just do something like System.out.println("the value of i is "+i) and it properly formats i into the output. You have to do a bit more work in C to get that output. More precisely, you have to format print/scan (i.e., printf, scanf) each variable into the output/input. It’s more like scanning in Java, where you have to specify the type of data your expecting.

Consider the two programs below.

/*EnterNumber.java*/
import java.util.Scanner;

public class EnterNumber{

    public static void main(String args[]){
        int num;
        Scanner sc = new Scanner(System.in);
        
        System.out.println("Enter a number:");
        num = sc.nextInt();
        System.out.println("You entered " + num);
    }
}
/*enternumber.c*/
#include <stdio.h>

int main(int argc, char * argv[]){
  int num;

  printf("Enter a number\n");
  scanf("%d", &num); //use &num to store 
                     //at the address of num

  printf("You entered %d\n", num);
}

The two programs above both ask the user to provide a number, and then print out that number. In Java, we can scan from input from terminal via System.in and write our output to the terminal via System.out. In the C parlance, we use the terms stdin (or standard in) and stdout (or standard out) to refer to the terminal output stream and input stream, respectively.

To access stdin and stdout for reading/writing In C, we useformat printing and format scanning to do the same basic I/O. The format tells C what kind of input or output to expect. In the above program enternumber.c, the scanf asks for a %d from stdin, which is the special format for a number, and similar, the printf has a %d format to indicate that num should be printed as a number to stdout.

There are other format options. For example, you can use %f to request a float or double.

/* getpi.c */

#include <stdio.h>
int main(int argc, char * argv[]){

  float pi;
  printf("Enter pi:\n");
  scanf("%f", &pi);

  printf("Mmmm, pi: %f\n", pi);
}

And you can use modifiers on the format to change the number of decimals to print. %0.2f says print a float with only 2 trailing decimals. You can also include multiple formats, and the order of the formats match the additional arguments

int a=10,b=12;
float f=3.14;
printf("An int:%d a float:%f and another int:%d", a, f, b);
//              |          |                  |   |  |  |
//              |          |                  `---|--|--'
//              |          `----------------------|--'    
//              `---------------------------------'

There are a number of different formats available, and you can read the manual pages for printf and scanf to get more detail.

man 3 printf
man 3 scanf

You have to use the 3 in the manual command because there exists other forms of these functions, namely for Bash programming, and you need to look in section 3 of the manual for C standard library manuals. You can also find these man pages on die.net — printf and scanf.

For this class, we will mostly use the following format characters frequently:

The FILE * and opening files #

The last part of the standard input output library we haven’t explored is reading/writing from files. Although, you’ve done this already in the form of the standard files, e.g., standard input and output, we have demonstrated how to open, read, write, and close other files that may exist on the file system.

All the file stream functions and types are defined in the header file stdio.h, so you have to include that. The man page for is here.

Open files in the standard C library are referred to as file streams, and have the type:

FILE * stream;

and we open a file using fopen() which has the following prototype:

FILE * fopen(const char *path, const char *mode);

The first argument path is a string storing the file system path of the file to open, and mode describes the settings of the file. For example:

FILE * stream = fopen("gobuffs.txt", "w");

will open a file in the current directory called “gobuffs.txt” with write mode. We’ll discuss the modes more shortly.

File streams, as pointers (that’s what the * stands for, more on this later), are actually dynamically allocated, and they must be deallocated or closed. The function that closes a file stream is fclose()

fclose( stream );

File Modes #

The mode of the file describes how to open and use the file. For example, you can open a file for reading, writing, append mode. You can start reading/writing from the start or end of the file. You can truncate the file when opening removing all the previous data. From the man page, here are the options:

The argument mode points to a string beginning with one of the following sequences  (possibly  followed
by additional characters, as described below):

r      Open text file for reading.  The stream is positioned at the beginning of the file.

r+     Open for reading and writing.  The stream is positioned at the beginning of the file.

w      Truncate  file  to zero length or create text file for writing.  The stream is positioned at the
       beginning of the file.

w+     Open for reading and writing.  The file is created if it does not exist, otherwise it  is  trun‐
       cated.  The stream is positioned at the beginning of the file.

a      Open  for  appending  (writing  at end of file).  The file is created if it does not exist.  The
       stream is positioned at the end of the file.

a+     Open for reading and appending (writing at end of file).  The file is created  if  it  does  not
       exist.   The  initial  file  position for reading is at the beginning of the file, but output is
       always appended to the end of the file.

One key thing to notice from the modes is that any mode string with a “+” is for both reading and writing, but using “r” vs. “w” has different consequences for if the file already exists. With “r” a file will never be truncated if it exists, which means its contents will be deleted, but “w” will always truncate if it exits. However, “r” mode will not create the file if it doesn’t exist while “w” will. Finally, append mode with “a” is a special case of “w” that doesn’t truncate with all writes occurring at the end of the file.

As you can also see in the man page description, the file stream is described as having a “position” which refers to where within the file we read/write from. When you read a byte, you move the position forward in the file. In later lessons, we will look into how to manipulate the stream more directly. For now

Format Output with fprintf() #

Let’s start where we always start with understanding a new input/output system, hello world!

/*hello_fopen.c*/
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  FILE * stream = fopen("helloworld.txt", "w");

  fprintf(stream, "Hello World!\n");

  fclose(stream);
}
aviv@saddleback: demo $ ./hello_fopen 
aviv@saddleback: demo $ cat helloworld.txt
Hello World!
aviv@saddleback: demo $ ./hello_fopen 
aviv@saddleback: demo $ cat helloworld.txt
Hello World!

The program opens a new stream with the write mode at the path “helloworld.txt” and prints to the stream, “Hello World!\n”. When we execute the program, the file helloworld.txt is created if it doesn’t exist, and if it does it is truncated. After printing to it, we can read it with cat, and we see that in fact “Hello World!” is in the file. If we run the program again, we still have “Hello World!” in the file, just one, and that’s because the second time we run the program, the file exists, so it is truncated. The previous “Hello World!” is removed and we write “Hello World!”.

However if we wanted to open the file in a different mode, say append, we get a different result:

/*hello_append.c*/
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  FILE * stream = fopen("helloworld.txt", "a");//<--

  fprintf(stream, "Hello World!\n");

  fclose(stream);
}
aviv@saddleback: demo $ ./hello_append 
aviv@saddleback: demo $ ./hello_append 
aviv@saddleback: demo $ ./hello_append 
aviv@saddleback: demo $ cat helloworld.txt 
Hello World!
Hello World!
Hello World!
Hello World!

The original “Hello World!” remains, and the additional “Hello World!”’s are append to the end of the file. Printing “Hello World!” does not require a format, but fprintf() can format just like printf(), but to a file stream.

Format Input with fscanf() #

Just as we can format print to files, we can format read, or scan, from a file. fscanf() is just like scanf(), except it takes a file stream as the first argument. For example, consider a data file with following entries:

aviv@saddleback: demo $ cat file.dat
Aviv Adam 10 20 50 3.141592 yes
Pepin Joni 15 21 53 2.781 no

We can write a format to read this data in with fscanf():

int main(int argc, char * argv){

  FILE * stream = fopen("file.dat", "r");

  char fname[1024],lname[1024],yesno[4];
  int a,b,c;
  float f;

  while ( fscanf(stream,
                 "%s %s %d %d %d %f %s",
                 fname, lname, &a, &b, &c, &f, yesno) != EOF){

    printf("First Name: %s\n",fname);
    printf( "Last Name: %s\n",lname);
    printf("         a: %d\n",a);
    printf("         b: %d\n",b);
    printf("         b: %d\n",c);
    printf("         f: %f\n",f);
    printf("     yesno: %s\n", yesno);
    printf("\n");
  }

  fclose(stream);
  return 0;
}

And when we run it, we see that we scan each line at a time:

aviv@saddleback: demo $ ./scan_file 
First Name: Aviv
Last Name: Adam
         a: 10
         b: 20
         b: 50
         f: 3.141592
     yesno: yes

First Name: Pepin
Last Name: Joni
         a: 15
         b: 21
         b: 53
         f: 2.781000
     yesno: no

One thing you should notice from the scanning loop is that we compare to EOF, which is special value for “End of File.” The end of the file is encoded in such a way that you can compare against it. When scanning and you reach end of the file, EOF is returned, which can be detected and used to break the loop.

Another item to note is that scanning with fscanf() is the same as that with scanf(), and is white space driven to separate different values to scan. Also, “%s” reads a word, as separated by white space, and does not read the whole line.

The standard file streams #

C uses the abstraction of file pointers to define the standard file streams, which are defined variables for you to use in your code. These include stdin, stdout, and stderr. We can do format input and output with these file streams just like any file stream we opened using fopen().

By default, printf() prints to stdout, but you can alternative write to any file stream. To do so, you use the fprintf() function, which acts just like printf(), except you explicitly state which file stream you wish to print. Similarly, there is a fscanf() function for format reading from files other than stdin.

printf("Hello World\n"); //prints implicitly to standard out
fprintf(stdout, "Hello World\n"); //print explicitly to standard out
fprintf(stderr, "ERROR: World coming to an endline!\n"); //print to standard error

Error Checking in C: errno and strerror() #

Nearly all functions in the standard library return a value, typically an integer value of some type that is either true of false. This is how you check to see if an error occurred.

For example, check out the man page for the fopen, and refer to the return value section:

RETURN VALUE
       Upon successful completion fopen(), fdopen() and freopen() return a FILE pointer.  Otherwise,  NULL  is
       returned and errno is set to indicate the error.

If the file was not opened succesfully, we could can check for that, like below.


if ( (stream = fopen("DOESNOTEXIST.txt", "r")) == NULL){
   fprintf(stderr, "ERROR ... \n");
}

An error could occur, like above, most likely for the file not existing, but it could also be because of something else — how can you tell?

In Java, the program would throw an exception, a special object that encodes the error. That exception could be caught, printed, and perhaps handled gracefully. For example, that could be notifying the user that you opened a file that does not exist — you bozo!

In C, there is no such thing as an exception (at least not in the same way). As a way to communicate about errors that occur nested within a stack of function calls, there is a special global integer that gets set errno (error number) that encapsulates the kind of error.

While you could just print out that error number, that’s not entirely human readable. There is a set of defined constants for each error number you could compare against (check out the man page for errno for them all!), that would be quite a lot of work. Instead, of course, there are library calls that do this for you and provide a helpful error message.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main(){
  FILE * stream;
  if((stream = fopen("DOESNOTEXIST.txt","r")) == NULL){
    fprintf(stderr,"fopen: %s\n",strerror(errno));
    //      ^       ^
    //      |       '- returns a string for the given error
    //      '--- %s says format print a string

    exit(1); //exit/return from this program with an error
  }
}

Running this, you’d get the following output if the file truly does not exist

$ ./a.out
fopen: No such file or directory

And if you just didn’t have permission to read the file, but it did exist, you’d get a different error message.

$ ./doesnotexist 
fopen: Permission denied

(Aside: Note that this program exits with status 1 to indicate something went wrong. A program exits with status 0 when it succeeds.)

stderr and perror #

The above example exiting when it encountered the error, but we don’t always need to nor want to exit the program when an error occurs. Sometimes it makes sense just to let the user know something bad happened and move on.

But then we have a problem. How does the user differentiate between the programs output and reports of errors?

This is particularly problematic in the unix environment where pipelines are common for passing the output of one program (written to stdout) as the input to another program (read from stdin).

To differentiate from error reporting and output, we use the third standard file stream: stderr or standard error. It works just like the other two standard file streams and can be used in the same way with fprintf.

Now the error checking on our fopen is as follows:

    fprintf(stderr,"fopen: %s\n",strerror(errno));

At this point, you are probably like. That’s way, way too much typing for error checking, esepcially having to write fprintf and stderr over and over again. And you’re right! So there’s a library function perror that does just the above in one call.

   perror("fopen")

You can still use fprintf(stderr,...) to add additional details to the error, but the reason for the error will be well formatted and printed for you in a straightforward call.

Final note on error checking! #

While I do not always show error checking below, it is important that your code does error checking.

Do as I say, not as I do ;)