In this course, you will use a variety of tools implemented in different programming languages to generate useful facts about a program’s behavior and analyze them. You will learn about the tools and their theoretical underpinnings in this course, however it is assumed that you are already familiar with basic concepts in computer science and mathematics, as well as the use of various tools/commands in a UNIX-like environment.

This assessment is designed to help you determine specific skills and concepts you should learn or refamiliarize yourself with prior to starting this course. You do not need to master each and every one of these skills - but you should have command of the majority of them. While it will be possible to pick up some of these skills along the way, it will prove difficult to cover them all (especially true in shortened summer semesters).

The instructors and teaching assistants have only a limited capability to assist students with learning these tasks, and they will not be explicitly covered in video lessons.

Using a Virtual Machine

In this course, assignments are completed on a Virtual Machine (VM) that is provided by the teaching staff. This VM is a common operating platform that is also used for grading, which ensures that the code you create on your VM will behave the same way when it is graded. You should be comfortable with using VirtualBox, including importing a VM from a .ova file, adjusting VM settings (network, memory allocation, etc,). If you are unfamiliar with virtualization, you can find more information here.

Navigating a UNIX-like Environment

All assignments in this class assume a UNIX-like environment. You should be comfortable browsing folders and executing basic commands from a UNIX command prompt.

Sample task: Using just a command prompt in a UNIX-like environment (such as Linux or macOS), create a new folder, create a text file inside that folder, and change the file permissions to allow execution of that file (such as would be needed if the file contained a shell script). Finally, create a ZIP archive of your folder and extract it to a different location to verify that it was correctly created.

Writing Shell Scripts

For some assignments, you’ll need to write a shell script that can execute a command. You may use any common UNIX shell scripting language such as bash, csh, or even python.

Sample task: Write a script that can be executed with the command ./myscript.sh input.txt output.txt that takes an input text file and produces a version of the file that is sorted using the UNIX sort program. Your program should produce an output.txt that contains just Error if the input causes sort to crash.

Evaluating Data Structures

You will be performing analysis on common data structures such as trees, heaps, queues, and stacks. You should be able to write code in Java and C++ that can use common data structures.

Sample task: Using a high level language such as Java or C++, write a class that builds a balanced binary tree out of an unsorted array of integer values. Then write a search function to determine if a specific value exists in the tree. Your search function should use an algorithm such as a BFS (Breadth First Search) or DFS (Depth First Search) on the tree values and return false if the desired value does not exist in the tree.

Sample input array of values: [26, 28, 76, 32, 4, 87, 42, 9, 12]

Debugging High Level Code

Some tools we explore in this class automatically generate test cases that will find bugs in a provided application. You should be able to use a compile error or stack trace to figure out why a source file doesn’t compile. Additionally, you should be able to find and fix issues such as null pointer exceptions or invalid cast errors in a code file. As many applications we will be analyzing are relatively simple, we recommend having a high level of comfort debugging simple issues without the help of an IDE.

Sample task: Run the following Java program. Make sure that all code in the main method executes to produce correct results.

		
public class test {
	public static void main(String[] args) {
		System.out.println("Add 1 + 2:");
		System.out.println("Result = " + add(1,2));
		System.out.println("Add 40 + 2:");
		System.out.println("Result = " + add(40,2));
		System.out.println("Multiply 3 x 4:");
		System.out.println("Result = " + multiply(3,4));
		System.out.println("Multiply 12 x 20:");
		System.out.println("Result = " + add(12,20));
		System.out.println("Done!")
	}

	public static int add(int a, int b) {
		Integer result = null;
		result += a;
		result += b;
		return result;
	}

	public static int multiply(int a, int b) {
		int result = -1;
		for (int i = 0; i < b; i++)
			result = add(result, a);
		return result;
	}
}

	 
Using Man Pages and Modifying Commands

Many software analysis tools require more knowledge of the tool than will be provided in assignment instructions to effectively use the tool. You should be comfortable reading the documentation that is provided with tools and utilities. In particular, one great resource available for many tools are man pages. Most UNIX tools provide a man page that gives basic usage instructions and how to modify the command to accomplish different tasks.

Sample task: Use the uniq UNIX tool to take the following text (copy it into a text file) and remove duplicate lines.

		
	The quick brown fox jumps over the lazy dog.
	THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
	The Quick Brown Fox Jumps Over The Lazy Dog.
	My quick brown fox jumps over the lazy dog.
	His quick brown fox jumps over the lazy dog.
	The quick, brown fox jumps over the lazy dog.
	The quick brown fox jumps over the lazy dog.

	 

Modify your command to do the following things:

  • Prefix each line by the number of occurrences of that line in the file
  • Skip comparing the first three characters on the line
  • Ignore the case of the characters in the comparison
  • Hint: use the man command to figure out the capabilities of uniq.

    Understanding of Random Distributions and Sampling

    Some of the analysis and testing techniques exploit the power of randomness. While a thorough understanding of probability and statistics is not necessary for success in this course, you should be familiar with the concepts of the Random Distribution and Random Sampling. You should be able to apply these concepts to randomized algorithms and reason about the likelihood of a particular program behavior.

    Sample task: Assume you are writing a program to randomly award a prize to contestants. Contestants enter a 32-bit unsigned integer at random. If the number entered is between 50 and 65 and divisible by 5, the contestant wins.

    1. What is the likelihood that any given contestant will win?
    2. Assuming the program handles one contestant per minute, how long should we expect the game to run before a contestant wins? Minutes, hours, days, weeks or months?
    3. If the contestant enters the string “WIN” instead of an integer, what is the likelihood that the contestant will win?

    Working Knowledge of Code Build Processes

    The primary focus of this course is Software Analysis. Software can be analyzed at various stages in its life cycle including in its source code written by the programmer, an intermediate representation generated by a compiler or language toolchain, and in the program’s binary representation.

    While extensive knowledge of compilers, toolchains, and runtimes is not necessary for this course, in general you should have a working knowledge of what happens to your code when you “build” it.

    Sample task: You should be able to answer the following questions regarding code build processes:

    1. What is the difference between high level code, intermediate representation, assembly code, and machine language?
    2. What is the difference between a compiled language and an interpreted language?
    3. What happens to C/C++ code when you click “Build and Run” in an IDE?
    4. What happens to Java code when you click “Build and Run” in an IDE?