The purpose of the project is to improve the efficiency of an existing application (CocaPhase) used to analyze a subset of chromosomes within a large genotype data. The project will take the program CocaPhase as an input and will reduce its run time by applying parallel programming techniques in the first phase. The second phase will enable the program to be run in a distributed environment over a Grid network (NGS and/or ECDF).
The current application takes a long time to run the input data. The user currently has to run the application for about 6 hours to get the results for a small dataset of haplotypes of size 5 for 4000 cases on a single processor. The purpose of the project is to reduce this runtime by efficiently using the grid network.