Can we write codes automatically with GPT-3?

Introduction:

OpenAI created and released the first versions of GPT-3 back in 2021 beginning. We wrote a few text generation articles that time and tested how to create tinder bio using GPT-3. If you are interested to know more on what is GPT-3 or what is openai, how the server look, then read the tinder bio article.

In this article, we will explore Code generation with OpenAI models.

It has been noted already in multiple blogs and exploration work, that GPT-3 can even solve leetcode problems. We will try to explore how good the OpenAI model can "code" and whether prompt tuning will improve or change those performances.

Basic coding:

We will try to see a few data structure coding performance by GPT-3.

(a) Merge sort with python:

First with 200 words limit, it couldn't complete the


Write sample code for merge sort in python.

 

def merge(arr, l, m, r): 

    n1 = m - l + 1

    n2 = r- m 

  

    # create temp
arrays 

    L = [0] * (n1) 

    R = [0] * (n2) 

  

    # Copy data to
temp arrays L[] and R[] 

    for i in range(0 ,
n1): 

        L[i] = arr[l +
i] 

  

    for j in range(0 ,
n2): 

        R[j] = arr[m +
1 + j] 

  

    # Merge the temp arrays
back into arr[l..r] 

    i = 0     # Initial index of first subarray 

    j = 0     # Initial index of second subarray 

    k = l     # Initial index of merged subarray 

  

    while i < n1
and j < n2 : 

        if L[i] <=
R[j]: 

            arr[k] =
L[i] 

            i += 1

        else: 

            arr[k] =
R[j] 

            j += 1

        k += 1

  

    # Copy the
remaining elements of L[], if there 

    # are any 

    while i < n1: 

        arr[k] = L[i] 

        i += 1

        k += 1

  

    # Copy the
remaining elements of R[], if there 

    # are any 

    while j < n2: 

        arr[k] = R[j] 

        j += 1

        k += 1

  

# l is for left index and r is right index of the 

# sub-array of arr to be sorted 

def mergeSort(arr,l,r): 

    if l < r: 

  

        # Same as
(l+r)/2, but avoids overflow for 

        # large l and
h 

        m =
(l+(r-1))/2

  

        # Sort first
and second halves 

        mergeSort(arr,
l, m) 

        mergeSort(arr,
m+1, r) 

        merge(arr, l,
m, r)

Although we try a higher temperature with similar length, and this time it only comes up with a pseudo code. So, it is not so sure to come up with concrete code always.


def merge_sort(ls):

    

    if len(ls) <=
1:

        return ls

    

    left, right =
ls[:len(ls)//2], ls[len(ls)//2:]

    

    return
merge(merge_sort(left), merge_sort(right))

 

def merge(left, right):

    

    result = []

    

    while left and
right:

       
result.append(left.pop(0) if left[0] <= right[0] else right.pop(0))

    

    return left or
right or result

So the higher temperature may cause such bad results too. Now we will try less temperature with another problem and run multiple tries to see if that provides real code every time or again something else comes up.

(b) Bubble sort with python:

Prompt:

Write sample code for bubble sort with python.

Temperature is set to 0.25.

Try 1:


def bubbleSort(arr): 

    n = len(arr) 

  

    # Traverse through
all array elements 

    for i in range(n):


  

        # Last i
elements are already in place 

        for j in
range(0, n-i-1): 

  

            # traverse
the array from 0 to n-i-1 

            # Swap if
the element found is greater 

            # than the
next element 

            if arr[j]
> arr[j+1] : 

               
arr[j], arr[j+1] = arr[j+1], arr[j] 

  

# Driver code to test above 

arr = [64, 34, 25, 12, 22, 11, 90] 

  

bubbleSort(arr) 

  

print ("Sorted array is:") 

for i in range(len(arr)): 

    print
("%d" %arr[i]),

Try 2:

Generates the same.

Try 3:

def bubble_sort(lst):

last    for i in
range(len(lst)):

        for j in
range(len(lst)-1-i):

            if lst[j]
> lst[j+1]:

               
lst[j], lst[j+1] = lst[j+1], lst[j]

    return lst

So the code may not be proper in some generations with the same settings of parameters.

Let's try another problem, with even less temperature.

(c)


class Node:

    def __init__(self,
data):

        self.data =
data

        self.next = None

 

class LinkedList:

    def
__init__(self):

        self.head =
None

 

    def
insert_at_beginning(self, data):

        node =
Node(data)

        if self.head
is None:

            self.head
= node

        else:

            node.next
= self.head

            self.head
= node

 

    def
insert_at_end(self, data):

        node =
Node(data)

        if self.head
is None:

            self.head
= node

        else:

            current =
self.head

            while
current.next:

               
current = current.next

           
current.next = node

 

    def
delete_at_beginning(self):

        if self.head
is None:

            return

        self.head =
self.head.next

 

    def
delete_at_end(self):

        if self.head
is None:

            return

        current =
self.head

        while
current.next.next:

            current =
current.next

        current.next =
None

 

    def
traverse(self):

        if self.head
is None:

            return

        current =
self.head

        while current:

           
print(current.data)

            current =
current.next

So, at the same parameters, we will try again to regenerate a separate output. And with this high temperature we expect it to be a separate output definitely. So, let's see what happens.

 

class Node:

    def __init__(self,
data):

        self.data =
data

        self.next =
None

 

class LinkedList:

    def
__init__(self):

        self.head =
None

 

    def
insert_at_head(self, data):

        new_node =
Node(data)

        new_node.next
= self.head

        self.head =
new_node

 

    def
insert_at_tail(self, data):

        new_node =
Node(data)

        if self.head
== None:

            self.head
= new_node

            return

        current_node =
self.head

        while
current_node.next != None:

           
current_node = current_node.next

       
current_node.next = new_node

 

    def
insert_at_position(self, data, position):

        if position ==
0:

            self.insert_at_head(data)

            return

        current_node =
self.head

       
current_position = 0

        while
current_node != None and current_position < position:

           
current_position += 1

           
current_node = current_node.next

        if
current_node == None:

           
print("Invalid position!")

            return

        new_node =
Node(data)

        new_node.next
= current_node.next

       
current_node.next = new_node

 

    def
delete_at_position(self, position):

        if self.head
== None:

           
print("List is empty!")

            return

        if position ==
0:

            self.head
= self.head.next

            return

        current_node =
self.head

        current_position
= 0

        while
current_node != None and current_position < position:

           
current_position += 1

           
current_node = current_node.next

        if
current_node == None:

           
print("Invalid position!")

            return

       
current_node.next = current_node.next.next

 

    def
print_list(self):

        if self.head
== None:

           
print("List is empty!")

            return

        current_node =
self.head

        while
current_node != None:

           
print(current_node.data, end=" ")

           
current_node = current_node.next

        print()

This is funny as it does a separate but better implementation of linked list. But it seems to have captured the proper implementation of data structures. We are not checking how much of this is from memories, but this kind of code, seems to be a thing which is coming totally off the memory though. So, now, we will leave the basics and enter simple programs.

We aren't going to enter into direct long text programming questions, since that will need a lot of reasoning, so we will go to questions which test simple implementations in simple languages. Let's try, for example, fibonacci series questions.

(a) write a simple fibonacci program.


write code for Fibonacci series generation in python.

 

nterms = int(input("How many terms? "))

 

# first two terms

n1, n2 = 0, 1

count = 0

 

# check if the number of terms is valid

if nterms <= 0:

   print("Please
enter a positive integer")

elif nterms == 1:

  
print("Fibonacci sequence upto",terms,":")

   print(n1)

else:

  
print("Fibonacci sequence:")

   while count <
nterms:

       print(n1)

       nth = n1 + n2

       # update values

       n1 = n2

       n2 = nth

       count += 1
So, it writes a fine code using the simple formula. Does look like a bit of a script type, but we are just checking logics here.

Language problems:

Let's now try a bit of a language problem. Let's see if GPT-3 can do small language problems.

1. Write a program that asks the user for their name and greets them with their name.

Prompt:

Write a program that asks the user for their name and greets them with their name.

---

Output:

name = input("What is your name? ")

print("Hello, " + name + "!")

Write a program that asks the user for a number n and gives
them the possibility to choose between computing the sum and computing the
product of 1,…,n.

def compute_sum(n):

total = 0

for i in range(1, n+1):

total = total + i

return total

 

def compute_product(n):

product = 1

for i in range(1, n+1):

product = product * i

return product

 

def main():

n = int(input("Enter a number: "))

choice = input("Do you want to compute the sum or the
product? ")


if choice == "sum":

result = compute_sum(n)

elif choice == "product":

result = compute_product(n)

else:

print("Invalid choice")

return

print("The result is", result)

main()

Okay. So it can solve simple logical problem too. Lets now gear up the logic a bit up. Lets do 2-3 line problems.

Write a guessing game where the user has to guess a secret number. After every guess the program tells the user whether their number was too large or too small. At the end the number of tries needed should be printed. It counts only as one try if they input the same number multiple times consecutively.

I'm thinking of a number between 1 and 100.

Input:

Too low!

Too high!

Too low!

Correct! You needed 5 tries.

__________________________________________________

!! So this is the time when a code didn't come out. But as you may notice; the word program or code is not mentioned, so let's try with that word to help the model understand. Let's see if it still does mistake.

___________________________________________________

Write a program for a guessing game where the user has to guess a secret number. After every guess the program tells the user whether their number was too large or too small. At the end, the number of tries needed should be printed. It counts only as one try if they input the same number multiple times consecutively.

secret = 17

tries = 0

while True:

guess = int(input("Guess the secret number: "))

tries += 1

if guess == secret:

print("You guessed it! The secret number was", secret)

print("It took you", tries, "tries to guess the secret number.")

break

elif guess > secret:

print("Too high!")

else:

print("Too low!")

__________________________________________

And so it doesn't fail. So, lets move on to something a bit more subtle.

Easy level problems:

___________________________________________

Question (prompt):

Implement a data structure for graphs that allows modification (insertion, deletion). It should be possible to store values at edges and nodes. It might be easiest to use a dictionary of (node, edgelist) to do this.

Output:

A graph is a collection of nodes and edges. Nodes are connected by edges. Edges can have values assigned to them. It would be possible to store a graph in a dictionary where the keys are the nodes and the values are the edges. To insert a new node, simply add a new key to the dictionary. To insert a new edge, add a new key-value pair to the dictionary. To delete a node, simply delete the key from the dictionary. To delete an edge, delete the key-value pair from the dictionary.

___________________________________________

So, "implement a data structure" also doesn't trigger the model to write a code. Just to check, we will explicitly tell the model to write a program for it.

___________________________________________

Write a program to implement a data structure for graphs that allows modification (insertion, deletion). It should be possible to store values at edges and nodes. It might be easiest to use a dictionary of (node, edgelist) to do this.

A graph is a data structure that consists of a set of nodes (vertices) and a set of edges connecting them. There are many different ways to represent a graph, but one common way is to use a dictionary of (node, edgelist) pairs, where edgelist is a list of the nodes that are connected to the node.
For example, the following dictionary represents a graph with three nodes (A, B, and C) and three edges (A-B, B-C, and C-A): graph = { 'A' : ['B', 'C'], 'B' : ['C'], 'C' : ['A'] }
This dictionary can be used to represent any graph, no matter how complicated.

_____________________________________________

So this is not working yet. We will reuse the prompt, but cut the bulk off and see if it is able to implement.

______________________________________________

Write a program to implement graphs that allows modification (insertion, deletion). It should be possible to store values at edges and nodes. It might be easiest to use a dictionary of (node, edgelist) to do this.


 

def graph(nodes, edges):

    graph = {}

    for node in nodes:

        graph[node] =
[]

    for edge in edges:

       
graph[edge[0]].append(edge[1])

       
graph[edge[1]].append(edge[0])

    return graph

 

def add_node(graph, node):

    if node in graph:

       
print("Node already in graph")

    else:

        graph[node] =
[]

 

def add_edge(graph, edge):

    if edge[0] in
graph and edge[1] in graph:

       
graph[edge[0]].append(edge[1])

       
graph[edge[1]].append(edge[0])

    else:

       
print("Nodes not in graph")

 

def delete_node(graph, node):

    if node in graph:

        del
graph[node]

        for key in
graph:

            for i in
range(len(graph[key])):

                if
graph[key][i] == node:

                   
del graph[key][i]

                   
break

    else:

       
print("Node not in graph")

 

def delete_edge(graph, edge):

    if edge[0] in
graph and edge[1] in graph:

        for i in
range(len(graph[edge[0]])):

            if
graph[edge[0]][i] == edge[1]:

                del
graph[edge[0]][i]

                break

        for i in
range(len(graph[edge[1]])):

            if
graph[edge[1]][i] == edge[0]:

                del
graph[edge[1]][i]

                break

    else:

       
print("Nodes not in graph")

_________________________________-

So, yeah. This finally worked.

Conclusion:

Simply it makes sense to say that GPT-3 can code like a beginner given no help and explicit prompt about programming. But to make it create code for further complicated stuff, i.e. language problems, will be challenging. So, we will have to find more prompt engineering as well as ideas on how to crack a long problem down automatically to use GPT-3 for the rest. For this, I will be writing another follow-up blog on the same topic. Stay tuned for the same.

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

Introduction: SQL (Structured Query Language) is a cornerstone of data manipulation and querying in data science. SQL technical rounds are designed to assess a candidate’s ability to work with databases, retrieve, and manipulate data efficiently. This guide provides a comprehensive list of SQL interview questions segmented by experience level—beginner, intermediate, and experienced. For each level, you'll find key questions designed to evaluate the candidate’s proficiency in SQL and their ability to solve data-related problems. The difficulty increases as the experience level rises, and the final section will guide you on how to prepare effectively for these rounds. Beginner (0-2 Years of Experience) At this stage, candidates are expected to know the basics of SQL, common commands, and elementary data manipulation. What is SQL? Explain its importance in data science. Hint: Think about querying, relational databases, and data manipulation. What is the difference between WHERE

Machine learning and statistics with python

Search This Blog