Skip to main content

2 ways to optimize your aws machine operations

 Introduction:

Many times, as a data scientist, you will be working on cloud machines which are on either aws, gcp or something else. These machines are costly, and so when you run your codes on these machines, you are actively increasing the project spending. During my last session in aws, I encountered a moderately large program to run; which according to my estimation would take 48-72 hrs to run. And to decrease the time needed as well as optimize the operations, I took 2 steps. This post will briefly describe these processes.

(1) cythonizing my codes:

Let's face it. Python is SLOW. yes, python is slow and that's why most of the standard computation libraries are written on cython or c++ background. But being a python and pandas dependent data scientist, I write most of my codes in pure python. These codes, are very slow to run; when let's say compared to c++ or cython equivalent of the same codes. So the easiest way to reduce operation time; is to create cython libraries from your existing python libraries. I will not write how to exactly do that, as paperspace has an awesome guide on how to turn python scripts into cython. 

Although on following the guide, I noticed one error which may occur if you don't follow coding style standards. That is, python codes may run with tabs and spaces mixed usage; even though it is a clear violation of pep-8 style guides. But when turning your python code into cython module; it creates an error message. Solution is to obviously go to the line manually and fixing it. But if you have a bigger file; and lots of errors are coming like this; then you may want to use autopep8 module to clean the errors out.

Once you cythonize your helper modules and libraries, you can expect around 40-50% speed increase. Now, if you don't have developer time restraint; i.e. your project time is not very tight and you can spend an extra day or two; maybe you should also consider writing proper cython syntaxes into your code. Again, I will not guide writing cython syntaxes or writing cython augmentation files in this post; but you can start using it from this official cython documentation.

Using this, there was, what I will say significant improvement in performance of the code in multiple sample files. There was around 40% decrease in total operation timing in top 3 performances. But there was one more problem left. 

If you have worked with cloud machines; you must have seen memoryError in the middle of your code one time at the very least. I used very simple tactics to solve this issue. And that is the second point I am going to discuss about.

(2) zipping bigger files and cleaning your "virtual home":

You run lots of code in cloud. If you write df in terminal; you see that there is not enough memory left and that's why your code stopped. If you are a beginner; chances are that,

(1) you didn't put stop points in your code and save intermediate outputs. 

(2) If you are an amateur like me, you did save intermediate outputs; but they clogged more memories and stopped your code in that path!

Whatever the reason maybe now you gotta clean a hell lot of things from the machine's permanent memory. Steps to follow here are:

(1) check if there is any resource left from code which you don't need for the current operations. Save these files in your relevant cloud storage in zipped formats; to both optimize redownload process as well as to reduce storage bucket costs.

(2) Delete the intermediate outputs you don't need if you have to rerun the process. It maybe possible in your code-flow to break the flow and run it from a place most near to where it stopped; and that's what intermediate outputs are for. So judge, delete and keep most important intermediate output only. Also, modify your code to make the most of previously used machine time.

(3) zip the moderate ( >100MB) to big(>GB) files. Consider the fact that if you are reading csv file; for example say using pd.read_csv; then it also accepts .gz versions of the same file. Advantage you get is that lesser memory is occupied in the original storage and the chance of your code stopping dramatically decrease. Change your code to store the intermediate stores in zipped form too. 

Now, even if you do all that; let's assume the worst possible scenario. Your code still gets memoryerror in one or more threads. [ obviously you have multiprocessed your code if you are using multiple cores]. In this scenario, the issue is that, your code, exceeds the possible virtual memory it is getting assigned to. That means, your hand is forced and you are bound to select a machine with possibly both more cores and bigger virtual memory assigned with it. 

For ec2 instances, check this page; and decide to which instance type; you want to change your machine to. For changing machine; 

from the machine name >(right click) > select "instance settings"                         > select "change instance type"

which opens up a dialogue box with a drop down of different instance types to choose from. Choose best suitable one and go with it. 

Increasing say a 4x times more core machine in the same category; helps you reducing the thread wise load by 4x bound increase * 4x time more threading = 16x times. So calculate like that and change it accordingly. 

Unfortunately I don't have google cloud or other environment experiences to provide such details guide in this instance issue. 

Anyway, those are my 2 steps to optimize my aws operations. Share if you liked these suggestions; comment to suggest changes, unintended mistakes if any; and/or add something more which you would like to read in this line.

Thanks for reading!

Comments

Popular posts from this blog

Tinder bio generation with OpenAI GPT-3 API

Introduction: Recently I got access to OpenAI API beta. After a few simple experiments, I set on creating a simple test project. In this project, I will try to create good tinder bio for a specific person.  The abc of openai API playground: In the OpenAI API playground, you get a prompt, and then you can write instructions or specific text to trigger a response from the gpt-3 models. There are also a number of preset templates which loads a specific kind of prompt and let's you generate pre-prepared results. What are the models available? There are 4 models which are stable. These are: (1) curie (2) babbage (3) ada (4) da-vinci da-vinci is the strongest of them all and can perform all downstream tasks which other models can do. There are 2 other new models which openai introduced this year (2021) named da-vinci-instruct-beta and curie-instruct-beta. These instruction models are specifically built for taking in instructions. As OpenAI blog explains and also you will see in our

Can we write codes automatically with GPT-3?

 Introduction: OpenAI created and released the first versions of GPT-3 back in 2021 beginning. We wrote a few text generation articles that time and tested how to create tinder bio using GPT-3 . If you are interested to know more on what is GPT-3 or what is openai, how the server look, then read the tinder bio article. In this article, we will explore Code generation with OpenAI models.  It has been noted already in multiple blogs and exploration work, that GPT-3 can even solve leetcode problems. We will try to explore how good the OpenAI model can "code" and whether prompt tuning will improve or change those performances. Basic coding: We will try to see a few data structure coding performance by GPT-3. (a) Merge sort with python:  First with 200 words limit, it couldn't complete the Write sample code for merge sort in python.   def merge(arr, l, m, r):     n1 = m - l + 1     n2 = r- m       # create temp arrays     L = [0] * (n1)     R = [0] * (n

What is Bort?

 Introduction: Bort, is the new and more optimized version of BERT; which came out this october from amazon science. I came to know about it today while parsing amazon science's news on facebook about bort. So Bort is the newest addition to the long list of great LM models with extra-ordinary achievements.  Why is Bort important? Bort, is a model of 5.5% effective and 16% total size of the original BERT model; and is 20x faster than BERT, while being able to surpass the BERT model in 20 out of 23 tasks; to quote the abstract of the paper,  ' it obtains performance improvements of between 0 . 3% and 31%, absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks. ' So what made this achievement possible? The main idea behind creation of Bort is to go beyond the shallow depth of weight pruning, connection deletion or merely factoring the NN into different matrix factorizations and thus distilling it. While methods like knowle