Can we write codes automatically with GPT-3?


OpenAI created and released the first versions of GPT-3 back in 2021 beginning. We wrote a few text generation articles that time and tested how to create tinder bio using GPT-3. If you are interested to know more on what is GPT-3 or what is openai, how the server look, then read the tinder bio article.

In this article, we will explore Code generation with OpenAI models. 

It has been noted already in multiple blogs and exploration work, that GPT-3 can even solve leetcode problems. We will try to explore how good the OpenAI model can "code" and whether prompt tuning will improve or change those performances.

Basic coding:

We will try to see a few data structure coding performance by GPT-3.

(a) Merge sort with python:

 First with 200 words limit, it couldn't complete the

Write sample code for merge sort in python.


def merge(arr, l, m, r):

    n1 = m - l + 1

    n2 = r- m


    # create temp arrays

    L = [0] * (n1)

    R = [0] * (n2)


    # Copy data to temp arrays L[] and R[]

    for i in range(0 , n1):

        L[i] = arr[l + i]


    for j in range(0 , n2):

        R[j] = arr[m + 1 + j]


    # Merge the temp arrays back into arr[l..r]

    i = 0     # Initial index of first subarray

    j = 0     # Initial index of second subarray

    k = l     # Initial index of merged subarray


    while i < n1 and j < n2 :

        if L[i] <= R[j]:

            arr[k] = L[i]

            i += 1


            arr[k] = R[j]

            j += 1

        k += 1


    # Copy the remaining elements of L[], if there

    # are any

    while i < n1:

        arr[k] = L[i]

        i += 1

        k += 1


    # Copy the remaining elements of R[], if there

    # are any

    while j < n2:

        arr[k] = R[j]

        j += 1

        k += 1


# l is for left index and r is right index of the

# sub-array of arr to be sorted

def mergeSort(arr,l,r):

    if l < r:


        # Same as (l+r)/2, but avoids overflow for

        # large l and h

        m = (l+(r-1))/2


        # Sort first and second halves

        mergeSort(arr, l, m)

        mergeSort(arr, m+1, r)

        merge(arr, l, m, r)

Although we try a higher temperature with similar length, and this time it only comes up with a pseudo code. So, it is not so sure to come up with concrete code always.

def merge_sort(ls):


    if len(ls) <= 1:

        return ls


    left, right = ls[:len(ls)//2], ls[len(ls)//2:]


    return merge(merge_sort(left), merge_sort(right))


def merge(left, right):


    result = []


    while left and right:

        result.append(left.pop(0) if left[0] <= right[0] else right.pop(0))


    return left or right or result


So the higher temperature may cause such bad results too. Now we will try less temperature with another problem and run multiple tries to see if that provides real code every time or again something else comes up.

(b) Bubble sort with python:


Write sample code for bubble sort with python.

Temperature is set to 0.25. 

Try 1:


def bubbleSort(arr):

    n = len(arr)


    # Traverse through all array elements

    for i in range(n):


        # Last i elements are already in place

        for j in range(0, n-i-1):


            # traverse the array from 0 to n-i-1

            # Swap if the element found is greater

            # than the next element

            if arr[j] > arr[j+1] :

                arr[j], arr[j+1] = arr[j+1], arr[j]


# Driver code to test above

arr = [64, 34, 25, 12, 22, 11, 90]




print ("Sorted array is:")

for i in range(len(arr)):

    print ("%d" %arr[i]),

Try 2:

Generates the same.

Try 3:

def bubble_sort(lst):

last    for i in range(len(lst)):

        for j in range(len(lst)-1-i):

            if lst[j] > lst[j+1]:

                lst[j], lst[j+1] = lst[j+1], lst[j]

    return lst

So the code may not be proper in some generations with the same settings of parameters. 

Let's try another problem, with even less temperature.



class Node:

    def __init__(self, data): = data = None


class LinkedList:

    def __init__(self):

        self.head = None


    def insert_at_beginning(self, data):

        node = Node(data)

        if self.head is None:

            self.head = node


   = self.head

            self.head = node


    def insert_at_end(self, data):

        node = Node(data)

        if self.head is None:

            self.head = node


            current = self.head


                current =

   = node


    def delete_at_beginning(self):

        if self.head is None:


        self.head =


    def delete_at_end(self):

        if self.head is None:


        current = self.head


            current = = None


    def traverse(self):

        if self.head is None:


        current = self.head

        while current:


            current =

So, at the same parameters, we will try again to regenerate a separate output. And with this high temperature we expect it to be a separate output definitely. So, let's see what happens.


class Node:

    def __init__(self, data): = data = None


class LinkedList:

    def __init__(self):

        self.head = None


    def insert_at_head(self, data):

        new_node = Node(data) = self.head

        self.head = new_node


    def insert_at_tail(self, data):

        new_node = Node(data)

        if self.head == None:

            self.head = new_node


        current_node = self.head

        while != None:

            current_node = = new_node


    def insert_at_position(self, data, position):

        if position == 0:



        current_node = self.head

        current_position = 0

        while current_node != None and current_position < position:

            current_position += 1

            current_node =

        if current_node == None:

            print("Invalid position!")


        new_node = Node(data) = = new_node


    def delete_at_position(self, position):

        if self.head == None:

            print("List is empty!")


        if position == 0:

            self.head =


        current_node = self.head

        current_position = 0

        while current_node != None and current_position < position:

            current_position += 1

            current_node =

        if current_node == None:

            print("Invalid position!")

            return =


    def print_list(self):

        if self.head == None:

            print("List is empty!")


        current_node = self.head

        while current_node != None:

            print(, end=" ")

            current_node =



This is funny as it does a separate but better implementation of linked list. But it seems to have captured the proper implementation of data structures. We are not checking how much of this is from memories, but this kind of code, seems to be a thing which is coming totally off the memory though. So, now, we will leave the basics and enter simple programs.

We aren't going to enter into direct long text programming questions, since that will need a lot of reasoning, so we will go to questions which test simple implementations in simple languages. Let's try, for example, fibonacci series questions.

(a) write a simple fibonacci program.


write code for Fibonacci series generation in python.


nterms = int(input("How many terms? "))


# first two terms

n1, n2 = 0, 1

count = 0


# check if the number of terms is valid

if nterms <= 0:

   print("Please enter a positive integer")

elif nterms == 1:

   print("Fibonacci sequence upto",terms,":")



   print("Fibonacci sequence:")

   while count < nterms:


       nth = n1 + n2

       # update values

       n1 = n2

       n2 = nth

       count += 1

So, it writes a fine code using the simple formula. Does look like a bit of a script type, but we are just checking logics here. 

Language problems:

Let's now try a bit of a language problem. Let's see if GPT-3 can do small language problems.

1. Write a program that asks the user for their name and greets them with their name.


Write a program that asks the user for their name and greets them with their name.  
name = input("What is your name? ")  
print("Hello, " + name + "!")

Write a program that asks the user for a number n and gives them the possibility to choose between computing the sum and computing the product of 1,…,n.

def compute_sum(n):

total = 0

for i in range(1, n+1):

total = total + i

return total


def compute_product(n):

product = 1

for i in range(1, n+1):

product = product * i

return product


def main():

n = int(input("Enter a number: "))

choice = input("Do you want to compute the sum or the product? ")

if choice == "sum":

result = compute_sum(n)

elif choice == "product":

result = compute_product(n)


print("Invalid choice")


print("The result is", result)



Okay. So it can solve simple logical problem too. Lets now gear up the logic a bit up. Lets do 2-3 line problems.


Write a guessing game where the user has to guess a secret number. After every guess the program tells the user whether their number was too large or too small. At the end the number of tries needed should be printed. It counts only as one try if they input the same number multiple times consecutively.


I'm thinking of a number between 1 and 100.






Too low!




Too high!




Too high!




Too low!



Correct! You needed 5 tries.


!! So this is the time when a code didn't come out. But as you may notice; the word program or code is not mentioned, so let's try with that word to help the model understand. Let's see if it still does mistake.


Write a program for a guessing game where the user has to guess a secret number. After every guess the program tells the user whether their number was too large or too small. At the end, the number of tries needed should be printed. It counts only as one try if they input the same number multiple times consecutively.


secret = 17

tries = 0

while True:

    guess = int(input("Guess the secret number: "))

    tries += 1

    if guess == secret:

        print("You guessed it! The secret number was", secret)

        print("It took you", tries, "tries to guess the secret number.")


    elif guess > secret:

        print("Too high!")


        print("Too low!")


And so it doesn't fail. So, lets move on to something a bit more subtle.

Easy level problems:

Question (prompt):

Implement a data structure for graphs that allows modification (insertion, deletion). It should be possible to store values at edges and nodes. It might be easiest to use a dictionary of (node, edgelist) to do this.  
A graph is a collection of nodes and edges. Nodes are connected by edges. Edges can have values assigned to them. It would be possible to store a graph in a dictionary where the keys are the nodes and the values are the edges. To insert a new node, simply add a new key to the dictionary. To insert a new edge, add a new key-value pair to the dictionary. To delete a node, simply delete the key from the dictionary. To delete an edge, delete the key-value pair from the dictionary.


So, "implement a data structure" also doesn't trigger the model to write a code. Just to check, we will explicitly tell the model to write a program for it.

Write a program to implement a data structure for graphs that allows modification (insertion, deletion). It should be possible to store values at edges and nodes. It might be easiest to use a dictionary of (node, edgelist) to do this.

graph is a data structure that consists of a set of nodes (vertices) and a set of edges connecting them. There are many different ways to represent a graph, but one common way is to use a dictionary of (node, edgelist) pairs, where edgelist is a list of the nodes that are connected to the node.
example, the following dictionary represents a graph with three nodes (A, B, and C) and three edges (A-B, B-C, and C-A): graph = { 'A' : ['B', 'C'], 'B' : ['C'], 'C' : ['A'] }
dictionary can be used to represent any graph, no matter how complicated.
So this is not working yet. We will reuse the prompt, but cut the bulk off and see if it is able to implement.

Write a program to implement graphs that allows modification (insertion, deletion). It should be possible to store values at edges and nodes. It might be easiest to use a dictionary of (node, edgelist) to do this.


def graph(nodes, edges):

    graph = {}

    for node in nodes:

        graph[node] = []

    for edge in edges:



    return graph


def add_node(graph, node):

    if node in graph:

        print("Node already in graph")


        graph[node] = []


def add_edge(graph, edge):

    if edge[0] in graph and edge[1] in graph:




        print("Nodes not in graph")


def delete_node(graph, node):

    if node in graph:

        del graph[node]

        for key in graph:

            for i in range(len(graph[key])):

                if graph[key][i] == node:

                    del graph[key][i]



        print("Node not in graph")


def delete_edge(graph, edge):

    if edge[0] in graph and edge[1] in graph:

        for i in range(len(graph[edge[0]])):

            if graph[edge[0]][i] == edge[1]:

                del graph[edge[0]][i]


        for i in range(len(graph[edge[1]])):

            if graph[edge[1]][i] == edge[0]:

                del graph[edge[1]][i]



        print("Nodes not in graph")


So, yeah. This finally worked.


Simply it makes sense to say that GPT-3 can code like a beginner given no help and explicit prompt about programming. But to make it create code for further complicated stuff, i.e. language problems, will be challenging. So, we will have to find more prompt engineering as well as ideas on how to crack a long problem down automatically to use GPT-3 for the rest. For this, I will be writing another follow-up blog on the same topic. Stay tuned for the same.


