Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / In the ?rst part, you will complete a partially-implemented web crawler program

In the ?rst part, you will complete a partially-implemented web crawler program

Computer Science

In the ?rst part, you will complete a partially-implemented web crawler program. In the second,

you will use Java classes from the package java.util.concurrent to divide the workload

among multiple threads of control. ?

Download the starting code in this zip ?le. Inside the src directory are two

subdirectories: spider contains the (partially complete) sequential web crawler for use with Part

1, and concurrentSpider contains the threaded web crawler code for use with Part 2.  This code

uses classes from the Java class library for parsing HTML ?les and pulling out links

(java.net.url among others).

Part 1 - Sequential Web Crawler.

1. In this part, you will complete a program that is able to "crawl the web" starting from a

given URL. A detailed problem speci?cation and descriptions of the provided classes can

be found in the ?rst section of Concurrent Data Structures in Java. Complete the tasks

given in the To Do sections and the Try This section.

2.

3. The ?nished program should output the URLs it processes as it crawls the website.

Capture this output in a ?le.

4.

5. There will be a check-in date to make sure nobody is waiting until the last minute to

complete this part. I will ask for copy of your Spider.java ?le to ensure everyone is

making progress.

Part 2 - Multi-threaded Web Crawler.

1. Production-quality web crawlers divide the work of following links from multiple web

pages among multiple versions of themselves, using a Java (and operating system)

concept called threads (you used threads in CSci 157 whenever you

programed ActiveObjects). Each thread must share the data structures being used,

which can cause problems if (for example) two instances try to remove from the work

queue at the same time. Because our Queue implementations are not thread-safe, they

cannot be used in this version. Instead, we can use implementations provided by the Java

package java.util.concurrent. Unfortunately the interface these classes

implement is different from the QueueInterface. For example, the method add is

used instead of enqueue, and peek or poll instead of dequeue.  ?

 

2. A detailed problem speci?cation can be found in the second section of Concurrent Data

Structures in Java. An outline of the concurrent solution and the problems that concurrent

data structures solve can be found in the third section. 

?

 

3. Complete the tasks given in the To Do sections and the Try This section of section two.

Deliverables

For this project you will turn in:

• a README ?le for each part, containing SHORT instructions on how to compile and run

each program and a description of any issues or known bugs. You do not need to describe

the program itself.

• all Java ?les required to build and run the program(s), using the same ?le structure as the

provided zip ?le

These ?les are to be turned in as follows: zipped into a single archive ?le, emailed to me.

?

Do not include binary ?les (such as .class ?les); sometimes Gmail ?ags those as malware and

won't send them.

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions