CSCI 136 |
So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey bulletin board hiding from you? A good explanation of why your results might not be what you expect can be found at Benford's Rule.
In this assignment, you will write a program that determines the
distribution of initial digits in a set of data. That is, we want to know how many 0's, are in the leftmost position (position 0), how many 1's, how many 2's, and so on. In the end, we want
a program that takes two command line arguments - the position of the digit you are counting and the name of the file containing the numbers to use. The program reads in a number x
and a list of x
numbers nums
from that file and outputs a list of 10 values: the frequency with
which each digit 0–9 appears as the n
th
digit of one of the input numbers. However, we'll break that problem
down into easier steps. (Note: throughout this problem, you may
assume that the numbers processed are non-negative or you can use the
absolute value function to help you handle
negative numbers in a reasonable way.)
Name your class (program) Benford.java. Below is the API that you are to implement, and below that is a description of what each method should do. I strongly suggest you test your methods as you go write them. Note that since we are not really creating this as a class, we are just using the methods internally, all methods should be created as public static
, and there are no constructors.
class Benford ----------------------------------------------------------------------------------------- int countDigits(int num) // return the number of digits in a number int nthDigitBack(int n, int num) // returns the digit at the nth position from the right int nthDigit(int n, int num) // returns the digit at the nth position from the left int [] nthDigitTally1(int n, int num, int [] tally) // updates and returns the tally array with a new count int [] nthDigitTally(int n, int [] nums) // returns a tally array of all digits in the nth position int [] readMysteriousNumbers(String fName) // opens a file and reads all the numbers into an array
countDigits
.countDigits(int num)
calculates and returns the number of digits in the
integer num
. countDigits
should evaluate to
1
for 0–9, 2
for 10–99,
3
for 100–999, etc. Hint: use repeated division by
10 to calculate the number of digits in num
.
nthDigitBack
.
nthDigitBack(int n,int num)
finds and returns the n
th
lowest order digit in num
, i.e., the
n
th digit from the right. We take the
rightmost digit to be the 0
th digit.
nthDigitBack
should evaluate to 0
for digits
beyond the "start" of the number. Hint: use repeated division by 10
again, followed by the modulo operator to pick out just the rightmost
digit. (You could also use exponentiation to avoid the repeated
division!). For example:
nthDigitBack(0,123)
⇒ 3
nthDigitBack(1,123)
⇒ 2
nthDigitBack(2,123)
⇒ 1
nthDigitBack(3,123)
⇒ 0
nthDigitBack(0,0)
⇒ 0
nthDigitBack(3,18023)
⇒ 8
nthDigit
, using
nthDigitBack
and countDigits
.
nthDigit(int n,int num)
finds and returns the n
th
highest order digit of num
, i.e., the
n
th digit from the left. We take the
leftmost digit to be the 0
th.
nthDigit
should evaluate to 0
for digits
beyond the "end" of the number. For example:
nthDigit(0,123)
⇒ 1
nthDigit(1,123)
⇒ 2
nthDigit(2,123)
⇒ 3
nthDigit(3,123)
⇒ 0
nthDigit(0,0)
⇒ 0
nthDigit(3,18023)
⇒ 2
nthDigitTally1
, using
nthDigit
.nthDigitTally1(int n, int num, int [] tally)
assumes that tally
is a 10 element array tallying the
number of n
th digits seen so far. It updates and returns
tally
to reflect the n
th digit of
num
. In other words, if d
is the
n
th digit of num
, then increment
the d
th element of tally
.
Examples:
nthDigitTally1(2, 1072, [0,0,1,2,0,0,3,0,9,0])
⇒ [0,0,1,2,0,0,3,1,9,0]
nthDigitTally1(0, 2541, [0,0,1,2,0,0,3,1,9,0])
⇒ [0,0,2,2,0,0,3,1,9,0]
nthDigitTally
, using
nthDigitTally1
. nthDigitTally(int n, int [] nums)
returns the array, tally, of frequencies of 0–9 as the
n
th digits of all the numbers in
nums
.
Here's a sample test case. These are enrollments in Research Triangle Park colleges and universities in Fall 2000 (thanks to the "Research Triangle Regional Partnership" website: http://www.researchtriangle.org/data/enrollment.html).
Institution | Enrollment |
---|---|
Duke University | 12176 |
North Carolina Central University | 5476 |
Louisburg College (Junior College) | 543 |
Campbell University | 3490 |
University of North Carolina at Chapel Hill | 24892 |
North Carolina State University | 28619 |
Meredith College | 2595 |
Peace College | 603 |
Shaw University | 2527 |
St. Augustine's College | 1465 |
Southeastern Baptist Theological Seminary | 1858 |
enrollments
contains the enrollment
numbers from that table. Then:
nthDigitTally(0, enrollments)
⇒
[0,3,4,1,0,2,1,0,0,0]
readMysteriousNumbers
that takes a file name as an input parameter and reads
whitespace-separated integers from that file. It returns a list of the numbers as an array suitable as input to
nthDigitTally
. The first entry in the file will be a count of the number of entries in the list. Here's the university enrollment data
from above:
11 12176 5476 543 3490 24892 28619 2595 603 2527 1465 1858From this,
readMysteriousNumbers
should produce the array
[12176, 5476, 543, 3490, 24892, 28619, 2595, 603, 2527, 1465,
1858]
. You must use the file input approach that we used last semester, that is, setting up a Scanner to read a file. In all data files, the first number indicates how many entries to read, and then the following numbers are the data entries themselves, one per line. The TestData.txt file is also available online.
n
followed by the name of the file holding the data on the command line. The program should tally the
n
th digits of the numbers in the data set and
print out a table of the results. For example, given:
> java Benford 0 TestData.txtYour program should print:
0s: 0 1s: 3 2s: 4 3s: 1 4s: 0 5s: 2 6s: 1 7s: 0 8s: 0 9s: 0
Grading
Grade Item | Points Possible | Points Earned |
---|---|---|
Program Compiles | 1 | |
Program Runs | 1 | |
javadoc Header Comment | 2 | |
javadoc Comments on all Methods | 2 | |
Implemented countDigits | 1 | |
Implemented nthDigitBack | 1 | |
Implemented nthDigit | 1 | |
Implemented nthDigitTally1 | 1 | |
Implemented nthDigitTally | 1 | |
Implemented readMysteriousNumbers | 1 | |
Implemented main | 1 | |
countDigits Runs Correctly | 2 | |
nthDigitBack Runs Correctly | 2 | |
nthDigit Runs Correctly | 2 | |
nthDigitTally1 Runs Correctly | 2 | |
nthDigitTally Runs Correctly | 2 | |
readMysteriousNumbers Runs Correctly | 2 | |
main Runs Correctly | 2 | |
Program Produces Correct Answer | 2 | |
Write Up of Results on All 4 Files | 1 | |
Total | 30 |
Extra Credit (up to 5 points): If you want to find the patterns hidden in the numbers around you, try the following two-part bonus problem:
readMysteriousNumbers
. The data must all be separate
measurements of a single type of phenomenon. For example:
measurements of university/college enrollments across different
institutions (like above) or at the same institution across different
years; measurements of the flow rates of all the major rivers in
British Columbia; measurements of the height of 10000 randomly chosen
Vancouver residents; measurements of the number of hits per day on the
UBC computer science website over three years; measurements of the
length in characters of each article in the Wikipedia; measurements of
the population of the 1000 largest cities and townships in Canada;
etc. Furthermore, there must be at least 250 measurements in
the list (but more would be better!). You may need to change your program to use floating point data, or change floating point data to integers. Also, most data files will not have the first number as the count of entries in the data. You may need to either add this or modify your program so that it does not need that piece of data.