huggingface transformers library part 3: Natural language generation

Introduction:

The part of machine learning which deals with language modeling and textual data; can be divided into three broad parts. These three parts are:

(1) Natural language processing (NLP) : this is processing of unstructured text data into structured, analyzed and often quantized data form.

(2) Natural language understanding or inferencing ( NLU/ NLI): this refers to working on algorithms and models which understand written languages rather than just statistically analyzing it.

(3) Natural language generation (NLG): this refers to building models/frameworks which generate contents which mimic the quality of actual human writings.

For years, NLP, NLU and NLG have progressed, with much variance and overlap in between them. With the introduction of RNN, LSTM, CNN etc, NLG systems have shown much better progress than previous times; just like the counterparts. In this article, we will test out the performance of NLG tasks using the transformers library; namely we will test models like bart-uncased, t5 from google and others. Without further introduction; let's dive in.

What are different NLG tasks?

NLG is the art of making machines generate human languages like them; so it comes as a sub-component of many text based data science tasks. Some of the well known NLG tasks are:

(1) machine translation ( seq2seq models)

(2) abstractive summarization

(3) Dialogue generation ( part of advanced chatbots)

(4) Creative writing: storytelling, poetry generation

(5) freeform QnA systems

(6) caption generation for images

(7) dynamic reporting ( such as generating dynamic reports from jsons using seq2seq models)

and others.

I have already briefly talked about the abstractive summarization using bart, t5 and distillbart in this previous post. If you are interested to learn more about summarization using abstractive methods; read that post first and then you can continue. Now, for this post, I am planning to explore creative writing, i.e. storytelling task.

My goal will be to provide a topic or even a few sentences for the story to start with and then check how it proceeds. If I see a chance to include some additional logic to improve the quality we'll do that too.

Story-time: generate a story:

First of all, don't let our hopes get up; as AI as we know of it currently; still doesn't hold their idea for much long. You can read a lot of stories created in this site using ai and human prompts; and most of them are classic text-dumps from a lot of web content. So our first goal will be to get there and then from that point; try and improve further.

For story generation, we have to use the "text generation" pipeline.

basic code and "once upon a time" story:

We use the sample code format from transformers example. Simply the code is:


generator = pipeline('text-generation',model = "gpt2",

					                       tokenizer="gpt2")

story_50 = generator("once upon a time there was a boy",
                      max_length = 50)

story_100 = generator("once upon a time there was a boy",
                      max_length = 100)

story_1000 = generator("once upon a time there was a boy",
                      max_length = 1000)

Now, we will look at the outputs and discuss further. Let's look at the 50 words story first.

The story reads like:

once upon a time there was a boy on Christmas Day, but the boy's parents
couldn't find it. It still seemed as if, with the help of a little word game, a person of goodwill could be found and the person who once tried

Clearly, although these look like a sentence; there is no proper character assignment. In the first part, the prompt; I started it with a boy; but after the coma,

"..parents couldn't find it."

Clearly, the boy becomes it.

Also, the second sentence is even weirder; with as the sentence has no meaning at all. So at the small part, the story can be safely told to be failing. We will try to investigate the reason later.

Now look at the 100 words story. It looks like:

once upon a time there was a boy called John. A child, John had lived with his father in the forest, and when he had met William he gave him a long-awaited return to the city. Though far from alone among his friends, he was a strong
soldier, and to do more for them they needed a leader. So, William put out a message he made in his memory to his father, asking his father to help him find his missing friends. For as long as he lived

Now clearly, each sentences have meanings as well as the context is being vaguely maintained. Although sentence to sentence, although characters exist now, the meaning doesn't hold up much better than the 50 word story. i.e. in some sense, it is a nonsense too.

Now, become ready for the masterpiece. The final piece is the story of 1000 words from that one line prompt. We will read a part of it, which will give a view on how good the story is.

Here is a small part of the 1000 words story.

once upon a time there was a boy who would give up a great many pleasures on earth, and have never ceased with them, for his delight was to be more numerous than in the world and his labour more abundant. And while he saw what he did, he would think himself so strong for eternity that he could even take with him things which had not to be. And with that in mind he went into Egypt, and he laid before the holy priests men about him his favourite pleasures and learned the art of his trade. He was like Solomon from the first when he was called a God, but a King at once became him, and then afterwards in spite of his fame and riches he was not worth the sight of sight and the ear, and did but have their honour and honour-things made of gold before him. He was in such a disposition and so pure a spirit that a man should go into Egypt to see one of his masters in his youth. And when he saw one of the priests he went, whereupon as he was going out about to show off his prowess to the people, he saw there four men in a car, and of them was the Master. And he said unto them "Told you to go to the king of the Jews, I will come." So they all entered and sat down and went out. And when they had come to the king of the Jews there he brought unto him his first-born son. This is the son of the Master. And, behold, he was on a horse riding through the country: the king of the Jews said unto him, "Thou hast given me gold; take thy hand out of mine shoulders and do these of my sons, and I shall give you the gold of all Israel: then shall ye not forsake this day the Sabbath for the rest of the world: but that thou wilt find my son the master of all the world, thy son the king of the Jews and the true master of the elect and the ruler of all the living creatures for evermore, the same will ye not forsake that day which thou hast been foretold of my son: and I will give thee one more for ever, when thou hast fulfilled the one that I give thee and have given thee thy other two, that thou mightest see my son the king of Israel."

See this sounds like an authentic bible story. The story holds most of its characters, a vague timeline idea as well as mentions events thoroughly. My guess is that this is highly "inspired" by some webpage containing christian stories and so on. But then again, I have not been able to find these exact stories using basic google search. By a slim chance, actually I might have generated authentic bible story on my own!☺

I will talk about more complex procedures of story generation, such as social planning and theory of mind in the next post about story generation. I will update the link for the next post here. Until then, stay tuned!

reference:

(1) simulation-based story generation

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Introduction: When preparing for a data science interview, brushing up on your coding and statistical knowledge is crucial—but math puzzles also play a significant role. Many interviewers use puzzles to assess how candidates approach complex problems, test their logical reasoning, and gauge their problem-solving efficiency. These puzzles are often designed to test not only your knowledge of math but also your ability to think critically and creatively. Here, we've compiled 20 challenging yet exciting math puzzles to help you prepare for data science interviews. We’ll walk you through each puzzle, followed by an explanation of the solution. 1. The Missing Dollar Puzzle Puzzle: Three friends check into a hotel room that costs $30. They each contribute $10. Later, the hotel realizes there was an error and the room actually costs $25. The hotel gives $5 back to the bellboy to return to the friends, but the bellboy, being dishonest, pockets $2 and gives $1 back to each friend. No...

Machine learning and statistics with python

Search This Blog