Skip to main content

Posts

Showing posts from 2021

Machine learning with pyspark

Introduction: In this last post , we discussed the basics of pyspark. Now, in this post, we will discuss how to do machine learning in pyspark. We will discuss what are the main machine learning pipeline elements, and how to use them too. The content is taken from datacamp ditto, and the sole credit of writing the blocks go to datacamp pyspark course. I am merely compiling it together for you to go through fast and learn quickly, with the completed exercises and the full flow. Machine Learning Pipelines You'll step through every stage of the machine learning pipeline, from data intake to model evaluation. Let's get to it! At the core of the pyspark.ml module are the Transformer and Estimator classes. Almost every other class in the module behaves similarly to these two basic classes. Transformer classes have a .transform() method that takes a DataFrame and returns a new DataFrame; usually the original one with a new column appended. For example, you might use t

Introduction to pyspark

 Introduction:  Pyspark is one of the first big data tools and one of the fastest too. In this article, we will discuss the introductory part of pyspark and share a lot of learning inspired from datacamp's course. The first step: The first step in using Spark is connecting to a cluster. In practice, the cluster will be hosted on a remote machine that's connected to all other nodes. There will be one computer, called the master that manages splitting up the data and the computations. The master is connected to the rest of the computers in the cluster, which are called worker . The master sends the workers data and calculations to run, and they send their results back to the master. Creating a connection to spark: Creating the connection is as simple as creating an instance of the SparkContext class. The class constructor takes a few optional arguments that allow you to specify the attributes of the cluster you're connecting to. An object holding all these att

Tinder bio generation with OpenAI GPT-3 API

Introduction: Recently I got access to OpenAI API beta. After a few simple experiments, I set on creating a simple test project. In this project, I will try to create good tinder bio for a specific person.  The abc of openai API playground: In the OpenAI API playground, you get a prompt, and then you can write instructions or specific text to trigger a response from the gpt-3 models. There are also a number of preset templates which loads a specific kind of prompt and let's you generate pre-prepared results. What are the models available? There are 4 models which are stable. These are: (1) curie (2) babbage (3) ada (4) da-vinci da-vinci is the strongest of them all and can perform all downstream tasks which other models can do. There are 2 other new models which openai introduced this year (2021) named da-vinci-instruct-beta and curie-instruct-beta. These instruction models are specifically built for taking in instructions. As OpenAI blog explains and also you will see in our