CSCI 136
Fundamentals of Computer Science II
Spring 2016

Montana Tech
Computer Science & Software Engineering



Looking Out for Number One sample plot of first digit data

Looking Out for Number One (or Zero if You're in CS)

The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation.

So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey bulletin board hiding from you? A good explanation of why your results might not be what you expect can be found at Benford's Rule.

Assignment

In this assignment, you will write a program that determines the distribution of initial digits in a set of data. In the end, we want a program that reads in a number n and a list of numbers nums and outputs a list of 10 values: the frequency with which each digit 0–9 appears as the nth digit of one of the input numbers. However, we'll break that problem down into easier steps. (Note: throughout this problem, you may assume that the numbers processed are non-negative or you can use the absolute value function to help you handle negative numbers in a reasonable way.)

  1. Write a method countDigits. countDigits(num) calculates the number of digits in the integer num. countDigits should evaluate to 1 for 0–9, 2 for 10–99, 3 for 100–999, etc. Hint: use repeated division by 10 to calculate the number of digits in num.

  2. Write a method nthDigitBack. nthDigitBack(n,num) finds the nth lowest order digit in num, i.e., the nth digit from the right. We take the rightmost digit to be the 0th digit. nthDigitBack should evaluate to 0 for digits beyond the "start" of the number. Hint: use repeated division by 10 again, followed by the modulo operator to pick out just the rightmost digit. (You could also use exponentiation to avoid the repeated division!). For example:

  3. Write a method nthDigit, using nthDigitBack and countDigits. nthDigit(n,num) finds the nth highest order digit of num, i.e., the nth digit from the left. We take the leftmost digit to be the 0th. nthDigit should evaluate to 0 for digits beyond the "end" of the number. For example:

  4. Write a method nthDigitTally1, using nthDigit. nthDigitTally1(n, num, tally) assumes that tally is a 10 element list tallying the number of nth digits seen so far. It updates tally to reflect the nth digit of num. In other words, if d is the nth digit of num, then increment the dth element of tally. Examples:

  5. Write a method nthDigitTally, using nthDigitTally1. nthDigitTally(n, nums) returns a tally of frequencies of 0–9 as the nth digits of all the numbers in nums.

    Here's a sample test case. These are enrollments in Research Triangle Park colleges and universities in Fall 2000 (thanks to the "Research Triangle Regional Partnership" website: http://www.researchtriangle.org/data/enrollment.html).
    InstitutionEnrollment
    Duke University 12176
    North Carolina Central University 5476
    Louisburg College (Junior College) 543
    Campbell University 3490
    University of North Carolina at Chapel Hill 24892
    North Carolina State University 28619
    Meredith College 2595
    Peace College 603
    Shaw University 2527
    St. Augustine's College 1465
    Southeastern Baptist Theological Seminary 1858
    Assume the variable enrollments contains the enrollment numbers from that table. Then:

    nthDigitTally(0, enrollments)[0,3,4,1,0,2,1,0,0,0]

  6. Write a method readMysteriousNumbers that reads whitespace-separated integers from input (terminated by end-of-file) and returns a list of the numbers suitable as input to nthDigitTally. Here's the university enrollment data from above:
    12176
    5476
    543
    3490
    24892
    28619
    2595
    603
    2527
    1465
    1858
    
    From this, readMysteriousNumbers should produce the list [12176, 5476, 543, 3490, 24892, 28619, 2595, 603, 2527, 1465, 1858]. You may use the StdIn.java and StdOut.java code that we used last semester for your input and output if you like. In all data files, the first number indicates how many entries to read, and then the following numbers are the data entries themselves, one per line. The TestData.txt file is also available online.

  7. Finally, compose your main method to read a number n from input followed by a data set. The program should tally the nth digits of the numbers in the data set and print out a table of the results. For example, given:
    java Benford 0 < TestData.txt
    
    Your program should print:
    0s: 0
    1s: 3
    2s: 4
    3s: 1
    4s: 0
    5s: 2
    6s: 1
    7s: 0
    8s: 0
    9s: 0
    
Once your program has run correctly on TestData.txt, run it on the additional data files: LibraryBooks.txt, LiveJournal.txt, and SunSpots.txt. Write up your results on these three data files. Do your results appear consistent with the Benford Rule?

Grading
Grade ItemPoints PossiblePoints Earned
Program Compiles
1
Program Runs
1
Header Comment
2
Comments on all Methods
2
Implemented countDigits
1
Implemented nthDigitBack
1
Implemented nthDigit
1
Implemented nthDigitTally1
1
Implemented nthDigitTally
1
Implemented readMysteriousNumbers
1
Implemented main
1
countDigits Runs Correctly
2
nthDigitBack Runs Correctly
2
nthDigit Runs Correctly
2
nthDigitTally1 Runs Correctly
2
nthDigitTally Runs Correctly
2
readMysteriousNumbers Runs Correctly
2
main Runs Correctly
2
Program Produces Correct Answer
2
Write Up of Results on All 4 Files
1
Total
30

Submission. Submit your program and write up of results via Moodle. Be sure each submitted source file has the required header with your name, username, and a description of the program. Also be sure to submit the .java source files and not the .class files!

Extra Credit (up to 5 points): If you want to find the patterns hidden in the numbers around you, try the following two-part bonus problem:

  1. Find a data source on the web and transform it into a format suitable for input to readMysteriousNumbers. The data must all be separate measurements of a single type of phenomenon. For example: measurements of university/college enrollments across different institutions (like above) or at the same institution across different years; measurements of the flow rates of all the major rivers in British Columbia; measurements of the height of 10000 randomly chosen Vancouver residents; measurements of the number of hits per day on the UBC computer science website over three years; measurements of the length in characters of each article in the Wikipedia; measurements of the population of the 1000 largest cities and townships in Canada; etc. Furthermore, there must be at least 250 measurements in the list (but more would be better!).

  2. Submit with your extra credit assignment and the data you used, along with the url of where you found the data and a write up of what results you got to the Extra Credit dropbox at the top of the Moodle page. Be sure your name is on all files you submit.


Page last updated: December 28, 2016