Final Report

Summary

Since the project checkpoint, my project goal has diverted a little bit. For the final project, I implemented a framework for Neural Network training. It uses openmp to exploit the parallelism. Neural Network training process has the potential for parallelism when we update weights and calculate values for neurons. More specifically, neurons at the same level can be considered as independent to each other. Therefore, I used this observation to help me parallelize the training process of Neural Network. Overall, my implementation achieves a 35% speedup when training a network with 200 hidden neurons. It runs on GHC machine.

Background:

Neural Network uses Feed Forward algorithm to compute the value for each neuron. And it uses Back Propagation to update the weights between neurons. Feed Forward essentially computes the value of neuron by multiplying previous neurons and weights and then apply a transfer function. The weights are usually randomized at the beginning of the training. One can think of this process as multiple matrix multiplications between neurons and weights. At the end of feed forward algorithm, it will check the output against the actual data. And base on this result, it will update the weights accordingly, this way it improves accuracy of this neural network. The process of updating the weights is called Back propagation.

Approach:

Overall, the framework contains a neuralNetworkTrainer class that captures the essence of neural network. User can configure this class by constructing an object of it. User can then pass parameters such as number of iterations, momentum, and so on. Once user have constructed such object, he or she can then call the train() method to initiate the training process. Inside the neuralNetworkTrainer class, I used 2-d array to represent neural network. This representation allowed me to use openmp easily. The train method invokes feed forward and back propagate to continuously update the neural network until the network is able to predict results with high enough accuracy. Inside the feed forward and back propagate method, I used openmp with static scheduling and fixed pool size. The pool size was tuned for GHC machines.

Results:

Compared to the sequential version of the same program, my framework achieved roughly 35% speedup while having 200 hidden neurons. As the test result shows, as hidden units increase, my program obtains a better speedup. This is because my framework mainly focuses on parallelizing computing neurons locate at the same level. Nonetheless, this result applies to GHC machines only.

Result Graph

References:

Paper:

  1. Abhishek, K., A. Khairwa, T. Pratap, and S. Prakash. "A Stock Market Prediction Model Using Artificial Neural Network." 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12) (2012): n. pag. Web.

  2. Dahl, George, Alan McAvinney, and Tia Newhall. "Parallelizing neural network training for cluster systems." Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks. ACTA Press, 2008.

Neural Network Implementation: https://takinginitiative.wordpress.com/2008/04/23/basic-neural-network-tutorial-c-implementation-and-source-code/

Project Checkpoint

Checkpoint Report

As of Apr 19th, I have implemented a neural network that is able to take in data and make predictions. Currently it does not have anything parallel at the moment. And it is not able to constantly pulling new data and make predictions. Overall, I think I am still on schedule or maybe just a little bit behind. Nonetheless, after more researching and discuss with people, I decided to scale down my project goal a little bit. Currently, I want to focus on single-node training using OpenMP and maybe SIMD. I decided to put the idea of multi-node training on hold because it seems to involve a lot of extra work. I am not sure If I will be able to implement a multi-node training framework on time.

On the day of presentation, I intend to have graphs that display the speedup obtain by parallelized training. In addition, I want to present graphs that shows the predicted stock price vs actual stock price. If possible, I hope demo a real time prediction vs actual chart.

Lastly, here is a more detailed schedule for the rest of the project:

Project Proposal

Summary

I am going to implement an application that predicts stock price. It would involve using neural network and I will parallelize the training process using multi-core cpu platforms/clusters.

Background

The training process for neural network has good potential for speedup. A neural network is usually composed of number of neurons separated into multiple levels. Matrix multiplication is usually involved in calculating weights for each neuron. In addition, when using backward propagation algorithm, at each layer, neurons are usually independent from each other. Therefore, we could compute weights for these independent neurons in parallel. Therefore, with large amount of data, we should see a significant speedup after we parallelized the training process.

The Challenge

There are a few challenges when trying to parallelize the training process.

Resources

For this project, I will use Lateday clusters to carry out the training process. Yahoo API will help me to obtain real-time stock quotes. In terms of code, I will start the project from scratch. Lastly, there are a few paper that talks about either stock prediction using Neural Networks or parallelizing Neural Networks training.

  1. Abhishek, K., A. Khairwa, T. Pratap, and S. Prakash. "A Stock Market Prediction Model Using Artificial Neural Network." 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12) (2012): n. pag. Web.

  2. Dahl, George, Alan McAvinney, and Tia Newhall. "Parallelizing neural network training for cluster systems." Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks. ACTA Press, 2008.

Goals and Deliverables

Platform Choice

I will use Latedays machine to help me speed up the training process. And I will use C++ as the language because I am planning to experiment SIMD and OpenMP. Latedays are powerful cluster that allows me to split tasks across nodes. SIMD and OpenMp would be helpful when performing matrix multiplcation.

Schedule