Posts

Showing posts with the label mapreduce

Queueing theory based modeling and optimal scheduling in Map-Reduce-like Frameworks

Image
FORK-JOIN QUEUE MODELING AND OPTIMAL SCHEDULING IN PARALLEL PROGRAMMING FRAMEWORKS ABSTRACT MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of servers increases, the map phase can take much longer than expected. This thesis analytically shows that the stochastic behavior of the servers has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of servers without accurate scheduling can degrade the overall performance. We analytically model the map phase in terms of hardware, system, and application parameters to capture the effects of stragglers on the performance. Mean sojourn time (MST), the time needed to sync the completed tasks at a reducer, is introduced as a performance metric and mathematically formulated. Following that, we stochastically investigate the optimal task scheduling which leads to an equil...

FORK-JOIN QUEUE MODELING AND OPTIMAL SCHEDULING IN PARALLEL PROGRAMMING FRAMEWORKS

Image
FORK-JOIN QUEUE MODELING AND OPTIMAL SCHEDULING IN PARALLEL PROGRAMMING FRAMEWORKS ABSTRACT MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of servers increases, the map phase can take much longer than expected. This thesis analytically shows that the stochastic behavior of the servers has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of servers without accurate scheduling can degrade the overall performance. We analytically model the map phase in terms of hardware, system, and application parameters to capture the effects of stragglers on the performance. Mean sojourn time (MST), the time needed to sync the completed tasks at a reducer, is introduced as a performance metric and mathematically formulated. Following that, we stochastically investigate the optimal task scheduling which leads to an equ...

MapReduce Framework Optimization

Modeling and Optimization of Straggling Mappers ABSTRACT MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of mappers increases, the map phase can take much longer than expected. This paper analytically shows that stochastic behavior of mapper nodes has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of mappers without accurate scheduling can degrade the overall performance. We analytically capture the effects of stragglers (delayed mappers) on the performance. Based on an observed delayed exponential distribution (DED) of the response time of mappers, we then model the map phase by means of hardware, system, and application parameters. Mean sojourn time (MST), the time needed to sync the completed map tasks at one reducer, is mathematically formulated. Following that, we optimize MST by finding the t...