DEVELOPMENT OF AN ENHANCED CHECKPOINTING TECHNIQUE IN GRID COMPUTING USING PROGRAMMER LEVEL CONTROLS

TABLE OF CONTENTS

Abstract
Table of Contents
List of Abbreviations
Definition of Terms

CHAPTER ONE
INTRODUCTION
1.1       Background of the Study
1.2       Motivation
1.3       Research Problem
1.4       Research Aim and Objectives
1.5       Research Methodology
1.6       Limitations/Challenges

CHAPTER TWO
LITERATURE REVIEW
2.1       Introduction
2.2       Fault Tolerance
2.2.1    Fault Detection
2.2.2    Fault Rectification
2.2.3    Checkpointing
2.2.4    Full Checkpoint or Incremental Checkpoint
2.2.5    Uncoordinated or Coordinated Checkpointing
2.3       Checkpointing Levels
2.3.1    System Level (SLC)
2.3.2    Kernel (Operating System) Level (SLC-K)
2.3.3    Hardware Level (SLC-H)
2.3.4    Application-Level (ALC)
2.3.5    Programmer Level (ALC-P)
2.3.6    User Level (ALC-U)
2.3.7    Library Checkpointing
2.3.8    Pre-Compiler Checkpointing
2.3.9    Mixed Level Checkpointing (MLC)
2.4       SLC versus ALC Checkpointing: Problems and Solutions
2.5       Programmer Effort (Transparency)
2.6       Portability
2.7       Checkpoint Size
2.8       Flexibility
2.9       Efficiency
2.10     Restart Ability
2.11     Forced Checkpointing Generation
2.12     Correctness
2.13     Comparison between ALC and SLC
2.14     Related Work
2.15     Gap in the literature

CHAPTER THREE
MODIFIED ARCHECTURE OF FAULT TOLERANCE IN GRID COMPUTING
3.1       Introduction
3.2       Architecture of the Proposed Checkpointing Technique
3.3       Checkpointing Control Implementation
3.4       Job Rollback Recovery System Analysis
3.5       System Model
3.6       Application Model
3.7       Performance Evaluation Criteria

CHAPTER FOUR
RESULT ANALYSIS
4.1       Introduction
4.2       Results and Discussion

CHAPTER FIVE
SUMMARY, CONCLUSION AND RECOMMENDATIONS
5.1       Summary
5.2       Conclusion
5.3       Recommendations
References 

Abstract
Grid computing is a collection of computer resources from multiple locations assembled to provide computational services, storage, data or application services. Grid computing users gain access to computing resources with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are. Reliability and performance are among the key challenges to deal with in grid computing environments. Accordingly, grid scheduling algorithms have been proposed to reduce the likelihood of resource failure and to reduce the overhead of recovering from resource failure. Checkpointing is one of the faulttolerance techniques when resources fail. This technique reduces the work lost due to resource faults but can introduce significant runtime overhead. This research provided an enhanced checkpointing technique that extends a recent research and aims at lowering the runtime overhead of checkpoints. The results of the simulation using GridSim showed that keeping the number of resources constant and varying the number of gridlets, improvements of up to 9%, 11%, and 11% on throughput, makespan and turnaround time, respectively, were achieved while varying the number of resources and keeping the number of gridlets constant, improvements of up to 8%, 11%, and 9% on throughput, makespan and turnaround time, respectively, were achieved. These results indicate the potential usefulness of our research contribution to applications in practical grid computing environments.

CHAPTER ONE
INTRODUCTION
1.1 Background of the Study
Grid computing uses a computer network in which each computer's resources are shared with every other computer in the system. In view of this, computing becomes pervasive and individual users (or client applications) gain access to computing resources (processors, storage, data, applications, and so on) as needed with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are. The main objective in grid scheduling is to finish a job or application as soon as possible(Harshadkumar and Vipul, 2014). Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes cooperate to execute a task in order to achieve a high level of reliability and availability. A common approach to guarantee an acceptable level of fault tolerance in scientific computing is to use checkpointing. When a task fails it can be restarted from its most recently checkpointed state rather than from the beginning, which reduces the system loss and ensures reliability (Bakhta and Ghalem, 2014).

1.2 Motivation

The ability to checkpoint a running application and restart it later can provide many useful benefits like fault recovery, advanced resource sharing, dynamic load balancing and improved service availability. A fault-tolerant service is essential to satisfy QoS requirements in grid computing. However, excessive checkpointing results in performance degradation. Thus there is the need to improve the performance by reducing the number of times that checkpointing is invoked. The research on Grid computing can be helpful and applicable to some industries that have successfully adopted grid computing technology such....

For more Computer Science Projects click here
================================================================
Item Type: Postgraduate Material  |  Attribute: 65 pages  |  Chapters: 1-5
Format: MS Word  |  Price: N3,000  |  Delivery: Within 2hrs
================================================================

Share:

No comments:

Post a Comment

Select Your Department

Featured Post

Reporting and discussing your findings

This page deals with the central part of the thesis, where you present the data that forms the basis of your investigation, shaped by the...

Followers