CSCI 241 Labs: Lab 8
An Array of Problems
There are 5 checkpoints
, including
the
clean-up checkpoint, in this lab. You and your partner should work
together using just one of your accounts.
CHANGE WHO IS CONTROLLING THE COMPUTER AFTER EACH CHECKPOINT!
If you need help with any exercise, raise your hand.
Copy the lab materials to your
account from /home/student/Classes/Cs241/Labs/Lab08
In this lab, you and your
partner will write code to implement the linear regression calculations
discussed in the prelab.
Loading the Data
Start running BlueJ and open project Lab08. Edit the Linear
class.
Linear's
main()
method is designed to hold one set of data for a linear regression.
This includes its x-values, y-values and associated calculations.
By the end of this lab, your Linear
class will not only be able to
calculate the values for linear regression, but also graph the
associated
function.
When we create a Java class, we
need to make two kinds of decisions:
- Implementation
Decisions: Every time you write
code, you make implementation decisions. For example, we give you the signature
and return type (remember what that is?) of a method, and you write the
code to fill in the body of that method. The combination of the
signature and return type is called the method prototype.
- Design
Decisions: These are the types
of decisions made by a more seasoned programmer. The types of decisions
include: what class(es) do I need to accomplish the goal of
implementing a system? How should I break up the tasks into separate
methods? What variables and constants do I need?
Deciding on which classes to develop is an advanced topic that you
usually
see in upper-level computer science courses.
We do know enough at this time to choose variables and methods.
We start by deciding which
values to save as variables in the main()
method.
The main()
method is located at the bottom of the class.
The prelab contains the needed formulas.
We could decide to declare a variable for each
of the variables in the formulas. Here is the first formula:
y = mx + b
It contains these variables:
- y
- m
(the slope of the line)
- x
- b
(the y-intercept of the line)
Just inside the main()
method, you will see declarations for
two arrays of doubles
(one to hold the x-values and one to hold the y-values):
xArray[]
and yArray[].
Decisions:
- When do we keep a value in a
variable? If a value is calculated and used more than once, we keep it
in a variable so we don't have to recalculate it.
- When do we rely on another static
method to calculate a value for us? One good time to do this is when we
have a complex formula, like that for the slope.
The prelab contained this formula to calculate the slope (m):
Sum(xi*yi) - n*xavg*yavg
m = ------------------------ ,
Sum(xi2) - n*xavg2
where n is the number of points in the data set.
We'll save the slope (m)
in a variable so we can draw a graph of the function later.
To help figure out this formula, you'll write a method that sums the
products of each
x and y pair:
Sum(xi*yi)
This value doesn't need to be saved in a variable because we can call
that method again at any time to recalculate it.
However, it will save processing time to keep it in a variable, since
we use it for
r2
(correlation).
Check the formula in the prelab and see where it is used.
Here are your tasks for the first checkpoint:
- Add 3 new variables to the main()
method to hold calculated values for:
- the slope (m), named m
- the intercept (b), named b
- the correlation squared
(r2),
named r2
Initialize each one to zero (0). We'll add the calculations later.
Your code will read data from a file that contains real numbers.
Note that there is already
a variable declared named n.
It will hold the number of points
read in from the data file.
- Just past the declarations and initializations of variables in
the main()
method
you will see some code to read the data points from a file.
It is currently
incomplete. Your instructors have included a bit of "magical" code
from Chapter 12 of the text. It is the try
... catch block.
This code catches some of the errors likely to occur when trying to
read
from a file, including errors like "File not found".
This error can occur when a file name is mistyped.
Our data input file
contains several lines. The first
line contains an integer which tells us how many lines of x and y
values to expect
in the lines of data that follow.
Here is an example of how we can read from a file that is set up in
this way. size
has already been declared as an int,
and myArray
has been declared as an array of ints.
Scanner file = new Scanner(new File(prefix + fileName));
size = file.nextInt();
myArray = new int [size];
// while the file is not empty AND the array still has room
for (int i = 0; file.hasNext() && i < myArray.length; i++)
{
myArray[i] = file.nextInt();
}
In this example, the data part of the file contained one integer per
line.
Your file is a little different.
Go to a terminal window, and cd
into your Lab08
directory.
Type
more lab08.dat
To read from this data file, change the prefix
variable to contain the path to your
Lab08 directory.
(Remember, you can type pwd
at the command line to see your current
directory location.)
Complete the code between
the try
braces for this checkpoint.
The comments tell you what to include.
Because there are two numbers per line, you will need to call the
Scanner's nextDouble()
method twice to get both numbers.
- The main()
method should also print the values held in each array after
reading the data from the file. There is a method named printArrayContents
at the beginning of the class. Complete this method, using a loop (or
two)
to print the data in the array that comes from the parameter.
Once you have finished all these parts for the checkpoint,
test your program by uncommenting the lines in the main()
method which call it.
1
Show us your declared variables and the code you wrote to read the data
from
the file. Run your main()
method so we can see the results.
Be ready to answer:
- Why are the arrays for the
x's and y's instantiated when we start reading data from the file,
rather than at time of declaration?
Methods to Calculate the Sums
We will need 3 kinds of sums to do our calculations: sum of entries in
an array,
sum of the squares of the values in an array and sum of products of x
and y
values. Go back to the prelab and review the formulas for m,
b
and r2.
The Linear
class will contain 3 different methods to calculate these
sums. Each of these methods will take either one or two arrays as
parameters.
Here is what each part means in the prelab formulas:
- Sum(xi)
and Sum(yi)
hold the sum of all the x-values and y-values, respectively. Once we
get the values from these sums, we can easily calculate xavg
and yavg.
- Sum(xi2)
holds the sum of the squares of the x-values. We also see Sum(yi2),
which holds the sum of the squares of the y-values. These are
calculated using similar mechanisms, so the same method works for both.
- Sum(xi*yi)
holds the sum of all the x*y values.
Your next task is to write 3
different static
methods to calculate and return these values.
The method prototypes are:
- public
static double sum(double [] array),
- public
static double sumOfSquares(double [] array)
and
- public
static double sumOfProducts(double [] array1, double [] array2).
Each method should return a double
which
holds the result of the sum. Because we want these methods to work for
different arrays, make sure you send
the correct array (or arrays) to the methods as arguments.
To test your new methods, run
them directly by right-clicking the Linear
class
in the BlueJ window. When the Method Call window pops up, you will need
to provide arguments.
When you need an array as an argument, you can type its content inside
curly braces.
For example, when running the the sum
method, you can type
{1,2,3,4}
in the box before clicking Ok.
Here are the answers you should expect from each of the methods by
using the
indicated arrays:
sum()
(using {1,2,3,4}): 10.0
sumOfSquares()
(using {1,2,3,4}): 30.0
sumOfProducts()
(using {1,2,3,4} and {5,6,7,8}): 70.0
2
Show us your code and output for the 3 summing methods.
Time to Calculate!
It's now time to put these pieces together and do the full
calculations.
Looking at the original formulas, you can see that both the slope and
intercept formulas
need to use the average of the x values and the average of the y
values.
That tells us that calculating both averages inside the main()
method
would save some processing time.
- Uncomment the lines that
declare xAverage
and yAverage,
and finish their calculations by calling one of the methods you wrote
for the last checkpoint. Is there more than one choice for a
denominator?
- Earlier in the class, look
for a method named findM().
This method performs the calculation needed for the slope. Examine its
parameters. Uncomment the line inside that can now run, since you've
written your methods. Now, add a call to this method from your main()
method and save the value it returns in your own variable that holds
the slope.
- Using the findM()
method as a model, write 2 more methods:
public static double
findB(double avgX, double avgY, double slope)
and
public static double
findRSquared(double [] xValues, double [] yValues)
Each of these uses the corresponding formula in the prelab to calculate
and return values for the intercept and correlation squared,
respectively.
When you are ready to try them out, add calls in your main()
method and save the values in the variables you declared in checkpoint
1.
After making these method calls, print the values that you
have calculated.
When you run main(),
you should see these values:
- m = 0.09496242563609694
- b = -0.3514937389023274
- r2
= 0.9107185718154513
3
Show us your finished calculation methods and run main()
so we can see your results.
Plotting the Curve
Back to reviewing the original equation:
y = mx + b
Since you have calculated values for m
and b,
we can use those values to plot the full set of points on a graph and
draw the
associated best-fit line through them.
For this checkpoint, you will
finish the plot()
method. When working with graphics, we draw images based on pixel
coordinates.
These have the origin (0,0) in
the upper left and each pixel to
the right
or down increments x or y by 1. This doesn't work out very well for our
graphing. We want the origin in the lower
left and we want to stretch or
shrink our x and y values to fit into the window. To do this we include
methods to translate our x's and y's to pixel values.
- public
static int toPixelX(double x)
takes a double
representing the equation's x-value as a parameter and returns the
equivalent pixel position. The pixel position was calculated using this
formula: pixelX = 10 + x/4.
- public
static int toPixelY(double y)
takes a double
representing the equation's y-value as a parameter and returns the
equivalent pixel position. The pixel position was calculated using this
formula: pixelY = 450 - 2y.
toPixelX()
and toPixelY()
really depend on the data set.
If the values in the arrays were different, these two methods would
have
to change. See Extras for
Experts below, for example.
- public
void plot()
should draw the x- and y- axes, plot all points, and draw the best-fit
line. This method is already stubbed in. Our class extends ACM library
class named GraphicsProgram.
While inheritance is an advanced concept, what it means is that the
graphics "stuff", e.g. methods and colors, is already written. We just
have to use them.
Our method begins by
calling this.start();.
This opens the graphics window on the screen. It also contains
commented code for drawing the axes. Uncommenting those lines will draw
the axes. Uncomment the line in the main()
method that calls plot.
Run the main()
method to make certain the axes appear. You might have to expand the
lower part of your window to see the x-axis at the bottom.
- Plotting the points:
The code in plot()
contains special lines which draw GLines
(lines) and GOvals
(circles, in this case) in the plotting window. There is a partially
complete for
loop in the code which contains 3 lines in its body: the first 2 are
currently commented out and incomplete. Those lines calculate the
positions for the circles that will be plotted. The last line adds the
circle to the plotting window.
Complete the first 2 lines in the loop body that call toPixelX()
and toPixelY()
to determine where to place the circle. Also, make your loop go through
the entire length of the arrays by completing the missing parts of the for
control so that it does so.
- Draw the best-fit line:
The last line in plot()
draws a line in the plotting window that best fits the points. You will
need to calculate the values for the variables which are currently
commented out:
You want to draw a line between x = 0 and x = 2000. Calculate each
corresponding y value from the formula y
= mx + b. Translate each x and
y to its equivalent pixel position, and the last line of code will draw
the line.
4
Show us the code for the methods, and run the program so we can see the
graph.
It should look like the figure given in your prelab.
Don't forget to exit Firefox
before you log out.
5
Show us that you have logged out, cleaned up, turned off your monitor
and pushed in your chairs for this last checkpoint.