Page Rank Estimator The script page

Homework answers / question archive / Page Rank Estimator The script page_ran< py contains some boiler plate code

Page Rank Estimator The script page_ran< py contains some boiler plate code

Computer Science

Share With

Page Rank Estimator
The script page_ran< py contains some boiler plate code. In particular, the script provides a command line interface for a page rank calculator. The calculator should support the following command line interface:
usage: page_rank.py [-h] [-m {stochastic,distribution}] [-r REPEATS] [-s STEPS] [-n NUMBER] [datafile]
Estimates page ranks from link information
positional arguments: datafile Textfile of links among web pages as URL tuples
optional arguments: -h, --help show this help message and exit -m {stochastic,distribution}, --method {stochastic,distribution} selected page rank algorithm -r REPEATS, --repeats REPEATS number of repetitions -s STEPS, --steps STEPS number of steps a walker takes -n NUMBER, --number NUMBER number of results shown
There is some boilerplate code in _main_ that first parses the provided arguments and set the estimation algorithm to either distribution_page_rank or stochastic_page_rank . It then calls the functions read_graph and print_stats before running the selected algorithm. Execution of the algorithm is timed. The script finally displays the top ranked pages together with their PageRank.
The above four functions exist in the code, but currently only raise a RuntimError when called.
Your job is to implement the functions such that read_grapn returns some python object that contains the graph data. stochastic_page_rank should implement the first method explained above to estimate PageRanks via random walkers, whereas distribution_page_rank should implement the second method to estimate PageRanks via probability distributions. The functions that estimate PageRanks have to follow exactly the behaviour specified by the above pseudo code definitions.
Code optimization
After having implemented both algorithms, try to improve your solution with the aim to increase execution speed. You can explore any means to alter your code as long as your solution adheres to the specification above. Use timeit or other python modules to measure code execution times.
A short report (written in Markdown and placed into the root directory of your repository) should summarize code optimization strategies that you have applied or evaluated. Changes should be clearly described and evidenced by measurements. The report should be no longer than 500 words with a 10% leeway.
Hints
It is up to you to decide how you want to store the graph data in python. You can use one of the ways presented in the lectures (e.g. by implementing a Graph class, using a third party graph class, or using builtin python types such as dictionaries) or explore your own ways. You are allowed to add more functions or modules to the project, as long as you do not change the signature of the four functions called in main .