TABLE OF CONTENTS
CHAPTER ONE
1.1 Introduction
1.2 Problem of the study
1.3 Aim and Objectives of the study
1.4 Scope of the study
1.5 Significance of the study
1.6 Limitation of the study
1.7 Definition of terms
CHAPTER TWO
Literature Review
2.1 Search Engines
2.2 Building block of search engine
2.3 Search Engine Component
2.3.1 Text Acquisition
2.3.2 Text Transformation
2.3.5 Ranking
2.4 Issue in Search Engine Research
CHAPTER THREE
System Analysis and Design
3.0 System Analysis
3.1 System Overview
3.2 System Feature
3.3 Methods of Data Entry
CHAPTER FOUR
4.1 Choice and Justification of the programming laguage used
4.2 Implementation Plan
4.3 Program flowchart
4.4 Procedure chart
CHAPTER FIVE
Summary, Recommendation and Conclusion
5.1 Summary
5.2 Recommendation
5.3 Conclusion
References
Source Code
CHAPTER ONE
1.1 INTRODUCTION
This Project deals with the design and implementation of a content-based search engine. Content-based means that the system utilizes information available in the web documents in a holistic manner to determine what might be interesting to the user. We focus on textual content that is written in a natural language as opposed to, say, images included in the documents. We call the presented system a search engine, as it contains components to retrieve and index web documents, and it provides a mechanism to return a ranked subset of the documents according to the user's requests. The system should be able to process millions of documents in a reasonable time and respond to queries with a low average latency. The starting point is a Web Crawler (or spider) to retrieve all Web pages: it simply traverses the entire Web or a certain subset of it, to download the pages or files it encounters and save for other components to use. The actual traversal algorithm varies depends on the implementation; depth first, breadth first, or random traversal are all being used to meet different design goals. The parser takes all downloaded raw results, analyze and eventually try to make sense out of them. In the case of a text search engine, this is done by extracting keywords and checking the locations and/or frequencies of them. Hidden HTML tags, such as KEYWORDS and DESCRIPTION, are also considered. Usually a scoring system is involved to give a final point for each keyword on each page. Simple or complicated, a search engine must have a way to determine which pages are more important than the others, and present them to users in a particular order. This is called the Ranking System. The most famous one is the Page Rank Algorithm published by Google founders [Brin 1998].
A reliable repository system is definitely critical for any application. Search engine also requires everything to be stored in the most efficient way to ensure maximum performance. The choice of database vendor and the schema design can make big difference on performance for metadata such as URL description, crawling date, keywords, etc. More challenging part is the huge volume of downloaded files to be saved before they are picked up by other modules.
Finally, a front-end interface for users: This is the face and presentation of the search engine. When a user submits a query, usually in the form of a list of textual terms, an internal scoring function is applied to each Web page in the repository [Pandey 2005], and the list of result is presented, usually in the order or relevance and importance. Google has been known for its simple and straight forward interface, while some most recent competitors, such as Ask.com1, provide much richer user experience by adding features like preview or hierarchy displaying. This project work will focus on how we can make a search engine system that will that can gather information from all angles of the web, index and rank it while maintaining a simple and rich user interface for users to query information.
1.2 PROBLEM OF THE STUDY
To design a system for retrieving relevant information from the internet depending on the user supplied query. the internet carries an extensive range of information, such as the inter-linked hypertext documents etc. In the manual method of retrieving information condition the user to know the domain name which is the address that help users to retrieve information from the website. The amount of time users spends in searching information is very high and the probability of getting the information is low. This Search Engine System will perform an automatic job of scrawling (searching) hypertext document from various website and store it in a disk. When the user search for information using keyword, the Search Engine system will take the keywords and compare it with the hypertext document that reside on it database and return the URL Address where the information can be found.
1.3 AIM AND OBJECTIVE OF THE STUDY
The aim of this project work is to create a search engine system that retrieve information from the internet based on the keyword that the user supplied in an environment that is very simple for user interaction. The method for retrieving information will be will be easy for non-technical users.
The objective of this research work is to
1. Embedded the system in an environment that the user is already familiar with.
2. To enable user to retrieve information with simple application such as Web browser.
3. Reduce The amount of time internet users search for information.
1.4 SCOPE OF THE STUDY
User will be able retrieve generalized information relating Mathematics, Physics, Government, Physiology, Programming, Law, Health, Science etc. The system will cover entirely any text information that can be found on the internet.
1.5 SIGNIFICANT OF THE STUDY
1. Examine the practical method the system will be implemented
2. Internet users will be able to search information easily.
3. The amount of time that average user used in reaching information will be reduce.
4. User will be able to filter information based on needs.
5. This Project will also serve as a solid ground for researchers on want to improve on this system.
1.6 LIMITATION OF THE STUDY
We focused on providing information for user as fast as the system can relate. There is one major restriction that will be hard to handle during the implementation of the system. The system will craw and scrape information on hypertext document as much it can handle but the storage facilities that is provided to store information is very low compare to the amount of information that the system is expected to crawl.
For more Computer Science Projects click here
================================================================
Item Type: Project Material | Attribute: 66 pages | Chapters: 1-5
Format: MS Word | Price: N3,000 | Delivery: Within 2hrs
================================================================
No comments:
Post a Comment