CSCI 136 |
So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey bulletin board hiding from you? A good explanation of why your results might not be what you expect can be found at Benford's Rule.
In this assignment, you will write a program that determines the
distribution of initial digits in a set of data.
That is, we want to know how many 0's, are in the leftmost position (position 0), how many 1's, how many 2's, and so on.
In the end, we want
a program that takes two command line arguments - the position of the digit you are counting and the name of the file containing the numbers to use.
The program reads in a number x
and a list of x
numbers nums
from that file and outputs a list of 10 values: the frequency with
which each digit 0–9 appears as the n
^{th}
digit of one of the input numbers. However, we'll break that problem
down into easier steps. (Note: throughout this problem, you may
assume that the numbers processed are non-negative or you can use the
absolute value function to help you handle
negative numbers in a reasonable way.)
Name your program Benford.py. Below is the API that you are to implement, and below that is a more detailed description of what each method should do. It is important that your code implements the API as specified. I strongly suggest you test your methods as you go write them. You will also need a single constructor for the Benford class.
class Benford ----------------------------------------------------------------------------------------- __init__() # constructor - initializes any constants associated # with the class int countDigits(int num) # return the number of digits in a number int nthDigitBack(int n, int num) # returns the digit at the nth position from the right int nthDigit(int n, int num) # returns the digit at the nth position from the left int [] nthDigitTally1(int n, int num, int [] tally) # updates and returns the tally list with a new count int [] nthDigitTally(int n, int [] nums) # returns a tally list of all digits in the nth position int [] readMysteriousNumbers(String fName) # opens a file and reads all the numbers into a list
countDigits
.countDigits(int num)
calculates and returns the number of digits in the
integer num
. countDigits
should evaluate to
1
for numbers in the range of 0–9, 2
for numbers in the range of 10–99,
3
for 100–999, etc. Hint: you can use repeated division by
10 to calculate the number of digits in num
.
nthDigitBack
.
nthDigitBack(int n, int num)
finds and returns the n
^{th}
lowest order digit in num
, i.e., the
n
^{th} digit from the right. We take the
rightmost digit to be the 0
^{th} digit.
nthDigitBack
should evaluate to -1
for digits
beyond the "start" of the number. This is a flag so that we can use that value to determine that there was no digit in a particular position. Hint: use repeated division by 10
again, followed by the modulo operator to pick out just the rightmost
digit. (You could also use exponentiation to avoid the repeated
division!). For example:
nthDigitBack(0,123)
⇒ 3
nthDigitBack(1,123)
⇒ 2
nthDigitBack(2,123)
⇒ 1
nthDigitBack(3,123)
⇒ -1
nthDigitBack(0,0)
⇒ 0
nthDigitBack(3,18023)
⇒ 8
nthDigit
, using the methods
nthDigitBack
and countDigits
.
nthDigit(int n, int num)
finds and returns the n
^{th}
highest order digit of num
, i.e., the
n
^{th} digit from the left. We take the
leftmost digit to be the 0
^{th}.
nthDigit
should evaluate to -1
for digits
beyond the "end" of the number. For example:
nthDigit(0,123)
⇒ 1
nthDigit(1,123)
⇒ 2
nthDigit(2,123)
⇒ 3
nthDigit(3,123)
⇒ -1
nthDigit(0,0)
⇒ 0
nthDigit(3,18023)
⇒ 2
nthDigitTally1
, using
nthDigit
.nthDigitTally1(int n, int num, int [] tally)
assumes that tally
is a 10 element list tallying the
number of n
^{th} digits seen so far. It updates and returns
tally
to reflect the n
^{th} digit of
num
. In other words, if d
is the
n
^{th} digit of num
, then increment
the d
^{th} element of tally
.
Examples:
nthDigitTally1(2, 1072, [0,0,1,2,0,0,3,0,9,0])
⇒ [0,0,1,2,0,0,3,1,9,0]
nthDigitTally1(0, 2541, [0,0,1,2,0,0,3,1,9,0])
⇒ [0,0,2,2,0,0,3,1,9,0]
nthDigitTally
, using
nthDigitTally1
. nthDigitTally(int n, int [] nums)
returns the list, tally
, of frequencies of 0–9 as the
n
^{th} digits of all the numbers in
nums
.
Here's a sample test case. These are enrollments in Research Triangle Park colleges and universities in Fall 2000 (thanks to the "Research Triangle Regional Partnership" website: http://www.researchtriangle.org/data/enrollment.html).
Institution | Enrollment |
---|---|
Duke University | 12176 |
North Carolina Central University | 5476 |
Louisburg College (Junior College) | 543 |
Campbell University | 3490 |
University of North Carolina at Chapel Hill | 24892 |
North Carolina State University | 28619 |
Meredith College | 2595 |
Peace College | 603 |
Shaw University | 2527 |
St. Augustine's College | 1465 |
Southeastern Baptist Theological Seminary | 1858 |
enrollments
contains the enrollment
numbers from that table. Then:
nthDigitTally(0, enrollments)
⇒
[0,3,4,1,0,2,1,0,0,0]
readMysteriousNumbers
that takes a file name as an input parameter and reads
integers from that file. It returns a list of the numbers as an array suitable as input to
nthDigitTally
. The first entry in the file will be a count of the number of entries in the list. Here's the university enrollment data
from above:
11 12176 5476 543 3490 24892 28619 2595 603 2527 1465 1858From this,
readMysteriousNumbers
should produce the list
[12176, 5476, 543, 3490, 24892, 28619, 2595, 603, 2527, 1465,
1858]
.
You should use the file input approach that we used last semester
(see File I/O slides plus examples on the CSCI 135 website).
In all data files, the first number indicates how many entries to read, and then the following numbers are the data entries themselves, one per line.
The TestData.txt file is also available online, and contains the above numbers.
n
followed by the name of the file holding the data on the command line. The program should tally the
n
^{th} digits of the numbers in the data set and
print out a table of the results. For example, executing your program from the command line as:
> python Benford.py 0 TestData.txtYour program should print:
0s: 0 1s: 3 2s: 4 3s: 1 4s: 0 5s: 2 6s: 1 7s: 0 8s: 0 9s: 0
> python Autograder.pyin the command window. The results will tell you how well your code is doing and might point to areas where you need to do more work. The Autograder program assumes you have named your program and classes as specified, and that all methods are implemented according to the API.
Grade Item | Points Possible | Points Earned |
---|---|---|
Program Compiles and Runs | 2 | |
Header Comment | 2 | |
Comments on all Methods | 2 | |
Implemented countDigits | 1 | |
Implemented nthDigitBack | 1 | |
Implemented nthDigit | 1 | |
Implemented nthDigitTally1 | 1 | |
Implemented nthDigitTally | 1 | |
Implemented readMysteriousNumbers | 1 | |
Implemented main | 2 | |
countDigits Runs Correctly | 2 | |
nthDigitBack Runs Correctly | 2 | |
nthDigit Runs Correctly | 2 | |
nthDigitTally1 Runs Correctly | 2 | |
nthDigitTally Runs Correctly | 2 | |
readMysteriousNumbers Runs Correctly | 2 | |
Program Produces Correct Answer | 2 | |
Write Up of Results on All 4 Files | 2 | |
Total | 30 |
readMysteriousNumbers
. The data must all be separate
measurements of a single type of phenomenon. For example:
measurements of university/college enrollments across different
institutions (like above) or at the same institution across different
years; measurements of the flow rates of all the major rivers in
British Columbia; measurements of the height of 10000 randomly chosen
Vancouver residents; measurements of the number of hits per day on the
UBC computer science website over three years; measurements of the
length in characters of each article in the Wikipedia; measurements of
the population of the 1000 largest cities and townships in Canada;
etc. Furthermore, there must be at least 250 measurements in
the list (but more would be better!). You may need to change your program to use floating point data, or change floating point data to integers. Also, most data files will not have the first number as the count of entries in the data. You may need to either add this to the data file or modify your program so that it does not need that piece of data.
Page last updated: January 12, 2022