CSCI 136
Fundamentals of Computer Science II
Spring 2022

Montana Tech of The University of Montana
Computer Science & Software Engineering

Looking Out for Number One sample plot of first digit data

Looking Out for Number One (or Zero if You're in CS)

The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation.

So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey bulletin board hiding from you? A good explanation of why your results might not be what you expect can be found at Benford's Rule.


In this assignment, you will write a program that determines the distribution of initial digits in a set of data. That is, we want to know how many 0's, are in the leftmost position (position 0), how many 1's, how many 2's, and so on. In the end, we want a program that takes two command line arguments - the position of the digit you are counting and the name of the file containing the numbers to use. The program reads in a number x and a list of x numbers nums from that file and outputs a list of 10 values: the frequency with which each digit 0–9 appears as the nth digit of one of the input numbers. However, we'll break that problem down into easier steps. (Note: throughout this problem, you may assume that the numbers processed are non-negative or you can use the absolute value function to help you handle negative numbers in a reasonable way.)

Name your program Below is the API that you are to implement, and below that is a more detailed description of what each method should do. It is important that your code implements the API as specified. I strongly suggest you test your methods as you go write them. You will also need a single constructor for the Benford class.

class Benford
       __init__()                   # constructor - initializes any constants associated
                                    #     with the class
int    countDigits(int num)         # return the number of digits in a number
int    nthDigitBack(int n, int num) # returns the digit at the nth position from the right
int    nthDigit(int n, int num)     # returns the digit at the nth position from the left
int [] nthDigitTally1(int n, int num, int [] tally)  
                                    # updates and returns the tally list with a new count
int [] nthDigitTally(int n, int [] nums)
                                    # returns a tally list of all digits in the nth position
int [] readMysteriousNumbers(String fName)
                                    # opens a file and reads all the numbers into a list
  1. Write a method countDigits.
    countDigits(int num) calculates and returns the number of digits in the integer num. countDigits should evaluate to 1 for numbers in the range of 0–9, 2 for numbers in the range of 10–99, 3 for 100–999, etc. Hint: you can use repeated division by 10 to calculate the number of digits in num.

  2. Write a method nthDigitBack.
    nthDigitBack(int n, int num) finds and returns the nth lowest order digit in num, i.e., the nth digit from the right. We take the rightmost digit to be the 0th digit. nthDigitBack should evaluate to -1 for digits beyond the "start" of the number. This is a flag so that we can use that value to determine that there was no digit in a particular position. Hint: use repeated division by 10 again, followed by the modulo operator to pick out just the rightmost digit. (You could also use exponentiation to avoid the repeated division!). For example:

  3. Write a method nthDigit, using the methods nthDigitBack and countDigits. nthDigit(int n, int num) finds and returns the nth highest order digit of num, i.e., the nth digit from the left. We take the leftmost digit to be the 0th. nthDigit should evaluate to -1 for digits beyond the "end" of the number. For example:

  4. Write a method nthDigitTally1, using nthDigit.
    nthDigitTally1(int n, int num, int [] tally) assumes that tally is a 10 element list tallying the number of nth digits seen so far. It updates and returns tally to reflect the nth digit of num. In other words, if d is the nth digit of num, then increment the dth element of tally. Examples:

  5. Write a method nthDigitTally, using nthDigitTally1. nthDigitTally(int n, int [] nums) returns the list, tally, of frequencies of 0–9 as the nth digits of all the numbers in nums.

    Here's a sample test case. These are enrollments in Research Triangle Park colleges and universities in Fall 2000 (thanks to the "Research Triangle Regional Partnership" website:
    Duke University 12176
    North Carolina Central University 5476
    Louisburg College (Junior College) 543
    Campbell University 3490
    University of North Carolina at Chapel Hill 24892
    North Carolina State University 28619
    Meredith College 2595
    Peace College 603
    Shaw University 2527
    St. Augustine's College 1465
    Southeastern Baptist Theological Seminary 1858
    Assume the variable enrollments contains the enrollment numbers from that table. Then:

    nthDigitTally(0, enrollments)[0,3,4,1,0,2,1,0,0,0]

  6. Write a method readMysteriousNumbers that takes a file name as an input parameter and reads integers from that file. It returns a list of the numbers as an array suitable as input to nthDigitTally. The first entry in the file will be a count of the number of entries in the list. Here's the university enrollment data from above:
    From this, readMysteriousNumbers should produce the list [12176, 5476, 543, 3490, 24892, 28619, 2595, 603, 2527, 1465, 1858]. You should use the file input approach that we used last semester (see File I/O slides plus examples on the CSCI 135 website). In all data files, the first number indicates how many entries to read, and then the following numbers are the data entries themselves, one per line. The TestData.txt file is also available online, and contains the above numbers.

  7. Finally, compose a test main to read a number n followed by the name of the file holding the data on the command line. The program should tally the nth digits of the numbers in the data set and print out a table of the results. For example, executing your program from the command line as:
    > python 0 TestData.txt
    Your program should print:
    0s: 0
    1s: 3
    2s: 4
    3s: 1
    4s: 0
    5s: 2
    6s: 1
    7s: 0
    8s: 0
    9s: 0
Once your program has run correctly on TestData.txt, run it on the additional data files: LibraryBooks.txt, LiveJournal.txt, and SunSpots.txt. Write up your results on these three data files. You should include these results in your header comment on your python code rather than putting it in a separate file. Do your results appear consistent with the Benford Rule?


You can use the program to test your code and see how many points you've earned. Download this file, and to run it, type:
> python
in the command window. The results will tell you how well your code is doing and might point to areas where you need to do more work. The Autograder program assumes you have named your program and classes as specified, and that all methods are implemented according to the API.

Grade ItemPoints PossiblePoints Earned
Program Compiles and Runs
Header Comment
Comments on all Methods
Implemented countDigits
Implemented nthDigitBack
Implemented nthDigit
Implemented nthDigitTally1
Implemented nthDigitTally
Implemented readMysteriousNumbers
Implemented main
countDigits Runs Correctly
nthDigitBack Runs Correctly
nthDigit Runs Correctly
nthDigitTally1 Runs Correctly
nthDigitTally Runs Correctly
readMysteriousNumbers Runs Correctly
Program Produces Correct Answer
Write Up of Results on All 4 Files

Submission. Submit your program, with a write up of results in the header comment, via Moodle. Be sure each submitted source file has the required header with your name, username, and a description of the program. All your methods must be commented as discussed at the begining of lab. Be sure to submit the .py source files!

Extra Credit (up to 5 points): If you want to find the patterns hidden in the numbers around you, try the following two-part bonus problem:
  1. Find a data source on the web and transform it into a format suitable for input to readMysteriousNumbers. The data must all be separate measurements of a single type of phenomenon. For example: measurements of university/college enrollments across different institutions (like above) or at the same institution across different years; measurements of the flow rates of all the major rivers in British Columbia; measurements of the height of 10000 randomly chosen Vancouver residents; measurements of the number of hits per day on the UBC computer science website over three years; measurements of the length in characters of each article in the Wikipedia; measurements of the population of the 1000 largest cities and townships in Canada; etc. Furthermore, there must be at least 250 measurements in the list (but more would be better!). You may need to change your program to use floating point data, or change floating point data to integers. Also, most data files will not have the first number as the count of entries in the data. You may need to either add this to the data file or modify your program so that it does not need that piece of data.

  2. Submit with your extra credit assignment and the data you used, along with the url of where you found the data and a write up of what results you got to the Extra Credit dropbox at the top of the Moodle page. Any files you submit should be zipped into a single zip archive file. Be sure your name is on all files you submit.

Page last updated: January 12, 2022