CSCI 241 Labs: Lab 8
An Array of Problems


There are 5 checkpoints , including the clean-up checkpoint, in this lab. You and your partner should work together using just one of your accounts. CHANGE WHO IS CONTROLLING THE COMPUTER AFTER EACH CHECKPOINT! If you need help with any exercise, raise your hand.

Copy the lab materials to your account from /home/student/Classes/Cs241/Labs/Lab08

In this lab, you and your partner will write code to implement the linear regression calculations discussed in the prelab.

Loading the Data

Start running BlueJ and open project Lab08. Edit the Linear class. Linear's main() method is designed to hold one set of data for a linear regression. This includes its x-values, y-values and associated calculations. By the end of this lab, your Linear class will not only be able to calculate the values for linear regression, but also graph the associated function.

When we create a Java class, we need to make two kinds of decisions:

Deciding on which classes to develop is an advanced topic that you usually see in upper-level computer science courses. We do know enough at this time to choose variables and methods.

We start by deciding which values to save as variables in the main() method. The main() method is located at the bottom of the class.

The prelab contains the needed formulas. We could decide to declare a variable for each of the variables in the formulas. Here is the first formula:
y = mx + b
It contains these variables: Just inside the main() method, you will see declarations for two arrays of doubles (one to hold the x-values and one to hold the y-values): xArray[] and yArray[].

Decisions:

The prelab contained this formula to calculate the slope (m):
     Sum(xi*yi) - n*xavg*yavg
m = ------------------------ ,
Sum(xi2) - n*xavg2
where n is the number of points in the data set.

We'll save the slope (m) in a variable so we can draw a graph of the function later.

To help figure out this formula, you'll write a method that sums the products of each x and y pair:
Sum(xi*yi)
This value doesn't need to be saved in a variable because we can call that method again at any time to recalculate it. However, it will save processing time to keep it in a variable, since we use it for r2 (correlation). Check the formula in the prelab and see where it is used.

Here are your tasks for the first checkpoint:

1 Show us your declared variables and the code you wrote to read the data from the file. Run your main() method so we can see the results. Be ready to answer:

  1. Why are the arrays for the x's and y's instantiated when we start reading data from the file, rather than at time of declaration?

 

Methods to Calculate the Sums

We will need 3 kinds of sums to do our calculations: sum of entries in an array, sum of the squares of the values in an array and sum of products of x and y values. Go back to the prelab and review the formulas for m, b and r2. The Linear class will contain 3 different methods to calculate these sums. Each of these methods will take either one or two arrays as parameters. Here is what each part means in the prelab formulas:
  1. Sum(xi) and Sum(yi) hold the sum of all the x-values and y-values, respectively. Once we get the values from these sums, we can easily calculate xavg and yavg.
  2. Sum(xi2) holds the sum of the squares of the x-values. We also see Sum(yi2), which holds the sum of the squares of the y-values. These are calculated using similar mechanisms, so the same method works for both.
  3. Sum(xi*yi) holds the sum of all the x*y values.
Your next task is to write 3 different static methods to calculate and return these values. The method prototypes are: Each method should return a double which holds the result of the sum. Because we want these methods to work for different arrays, make sure you send the correct array (or arrays) to the methods as arguments.

To test your new methods, run them directly by right-clicking the Linear class in the BlueJ window. When the Method Call window pops up, you will need to provide arguments. When you need an array as an argument, you can type its content inside curly braces. For example, when running the the sum method, you can type {1,2,3,4} in the box before clicking Ok.

Here are the answers you should expect from each of the methods by using the indicated arrays:

sum() (using {1,2,3,4}): 10.0
sumOfSquares() (using {1,2,3,4}): 30.0
sumOfProducts() (using {1,2,3,4} and {5,6,7,8}): 70.0

2 Show us your code and output for the 3 summing methods.

 

Time to Calculate!

It's now time to put these pieces together and do the full calculations. Looking at the original formulas, you can see that both the slope and intercept formulas need to use the average of the x values and the average of the y values. That tells us that calculating both averages inside the main() method would save some processing time.

After making these method calls, print the values that you have calculated.
When you run main(), you should see these values:

3 Show us your finished calculation methods and run main() so we can see your results.

 

Plotting the Curve

Back to reviewing the original equation:
y = mx + b
Since you have calculated values for m and b, we can use those values to plot the full set of points on a graph and draw the associated best-fit line through them.

For this checkpoint, you will finish the plot() method. When working with graphics, we draw images based on pixel coordinates. These have the origin (0,0) in the upper left and each pixel to the right or down increments x or y by 1. This doesn't work out very well for our graphing. We want the origin in the lower left and we want to stretch or shrink our x and y values to fit into the window. To do this we include methods to translate our x's and y's to pixel values.

4 Show us the code for the methods, and run the program so we can see the graph. It should look like the figure given in your prelab.

Don't forget to exit Firefox before you log out.

5 Show us that you have logged out, cleaned up, turned off your monitor and pushed in your chairs for this last checkpoint.