Testing the Speed Difference
The goal of my project was to begin answering some of the questions asked above: How much faster is C++ than Java? Does the speed difference really matter? Are there specific areas where Javaıs performance is so much slower that it should not be used? Answering these questions involved writing code that performed identical operations in both C++ and Java and testing the amount of time it took to execute the programs. These programs, since they are as close as possible to being identical, can then be used to measure the performance difference.
For testing purposes, six different operations were selected. Four of the programs test calculation intensive activities and the other two test input/output and memory intensive activities. All six tests were programmed in both languages and implemented as identically as possible.
All four of the calculation intensive programs perform matrix operations. Two different operations, addition and matrix multiplication, are performed and each operation is performed on floating point and integer numbers. Both programs store all of the values in an array, instead of a linked list, to help keep as much of the overhead as possible focused on the actual calculations. The programs are fairly simple. They all accept two command line arguments that specify the size of the matrix to test, the program automatically generates the test matrices, the calculation is performed and then the program exits.
The two input/output and memory intensive programs create a singly linked list (SLL) of employee information. The programs each store three pieces of information about a distinct employee: their name as a string, department as a string, and salary as a floating point number. The method each program uses to retrieve the information, however, is different. One of the programs is designed to accept all of the data from standard input until 'done' is entered in the name field of an employee. After all of the data has been entered, the program prints all of the data to standard output. The other program is designed to open a specific data file, read the information from it, and print the information it retrieved to a specific output file. Both programs test how C++ and Java handle memory allocation because they create the SLL of employees but they differ in testing the type of input being performed.
After the programs were designed, data had to be created to test the programs. The data used, like the programs themselves, had to be identical between the programs to make the tests valid. For the matrices, Unix scripts were created that ran and timed the four different calculations with increasingly larger matrices. Obviously, the larger the matrix becomes, the more time the operation is expected to take because there are more calculations to be done. For the SLL's, Unix scripts were created that either echoed employee information to standard input or created an employee data file to be used by the programs. Each consecutive test entered more employees into the database. The specifics of the test data used and what specifically the program does are discussed, along with the results, later in the paper.
Data is useless until it is used to make inferences. Inferences are also useless if the data does not represent an accurate sample of the population. In order to check if the data gathered from testing the programs was an accurate sample, a test of the data's standardized residual was performed. The standardized residual is a statistical method that determines how closely a data set corresponds to a normal distribution (also known as a bell curve). Data that corresponds to a normal distribution can be considered to be a representative sample and, therefore, inferences made from the data can be considered to be fairly certain. The standardized residual graphs for all of the programs indicated that they were representative samples meaning we can use the data to draw conclusions about the performance of C++ versus Java and be fairly confident in them.
The following six sections discuss the specifics of each program, how it was tested, and the results of the test. The sections are followed by a summarizing section that draws conclusions based upon the gathered data.
Matrix Addition of Floating Point Values
The Code
The MatrixAddFloat Java and C++ programs add two square matrices together and store the result in a third matrix. To do so, the program performs a doubly nested loop that increments between each row and column and adds every entry in the two matrices and stores the result in the respective spot in the third matrix. The programs accept two values from standard input that determine the size of matrix to add. The two matrices to be added are duplicates of one another and are created by storing the reciprocal of a value that is increment across each column and then continued to be increment across the next row down. For example, a 4 by 4 square test matrix would look like this:
<CENTER>
</CENTER>
Creating the matrix in this way reduces the amount of overhead generated by input because only the matrix dimensions need to be known. Before the add is performed, the upper left and lower right corners of both matrices are printed. After the add is performed, the upper left and lower right corners of the result matrix are printed to help verify that the operation was performed correctly.
The Data
A total of one hundred tests were performed, five times each, on both of the programs. The tests used identical matrix dimensions and values to enable direct comparison of their performance. Originally, the tests started at 1 by 1 and ranged to 100 by 100. These values did not produce very interesting results because they were too small and the additions were too fast. To compensate, the matrix size was increased. The square matrix sizes used to generate the tests started at 5 by 5, incremented by 5 in both width and height each time, and ranged to 500 by 500. The number of adds performed for a matrix is dependent upon itıs size and is equal to the matrices height times its width. For instance, for the largest matrix, the program performed 250,000 additions. Each of these matrices was tested five times to ensure a representative sample. After completing all of the tests, the data was reduced to one hundred distinct points by calculating the mean average of the five individual trials.
The Results
The following table summarizes the descriptive statistics of this test. The table is followed by explanations of what the statistics imply.