Prompt Engineering: The tips we discovered

As we discussed in a previous blog post, we make plenty of use of large language models (LLMs) at Daily Digest. They power our summary of news across the globe, the creation of condensed news articles, as well as their headlines and hashtags to share by. The way we interact with our LLMs today is the product of a lot of experimentation to make sure we get the results we would like to share every day.

We iterated plenty of times on our prompting, to get the news articles as we would like to have them written. In doing so, we ran into multiple curious scenarios, in which our news articles started deviating broadly from what we intended. Every time this happened, we investigated the prompts we sent to our LLMs, seeing if there is some improvement hidden in plain sight.

This blog post is a collection of a few of such prompt engineering tips that we uncovered along the way. They, among others, ensure that our articles have the quality that we would like them to have.

Write how you would like it written

Originally we started out with a prompt that consisted of sentences such as these:

"Summarise this article. Use important information only."

On the surface, that made a lot of sense to us. Long prompts are costly, as the LLM needs to process them like natural language. However, we got results that were less than optimal, with sentences such as "Politicians meet in DC. Discussions happening." Not exactly Pulitzer material. By writing longer sentences that sound like the result we want to achieve, with sub-sentences and more complex structures, we made our content less bullet-pointy.

Keywords that you do NOT want

Despite the above, at times we still got the same bullet point sentences as above, seemingly at random. It took us quite a while to figure out why, as it depended on just one word: "summary".

While it is a gross oversimplification, LLMs operate by going through the words in a prompt, and assigning each word (or sometimes token) a value of importance. And if the LLM "thinks" that a word is very important, it tends to get hung up on that premise.

In our case we told our LLM to "write a summary". The thing is, we are not writing a summary. We want a news article. A summary, in the sense of natural language, is the drastic shortening of information. Given that angle, short, hacky sentences fit right in.

So, scan your prompts for important keywords that do not 100% fit your use case. For example, do you really want a "recipe" for some delicious pancakes, or is it more of a "shopping list"?

Lead by example

The field of human language is broad. Very broad. It has been around for literal thousands of years after all, and in more recent history, we piped those thousands of years into LLMs. Sometimes, a single word to describe what you want is not enough. A "news article" can be two sentences, or the breaking news of the century covering all of the front page.

Hence, as one of our more recent changes, we included a general example of what we actually want. Phrases such as "Start with an introductory sentence explaining the topic" work wonders. Same goes for enumerations: 1. this, 2. then this, 3. then that.

LLMs hate counting

Speaking of enumerations, as good as LLMs are with words, they are all the more terrible with numbers. They are doing okay with smaller numbers ("Give me 10 way to make an omelette"), but especially with large quantities, or even worse, counting, LLMs hit their limits.

This is the reason why, for a while, we really struggled with articles that were way too long. Large language models are talkative. They literally produce large language, if you will.

The reason that is happening is that most LLMs do not generate output iteratively. If you were to write a news article, you would write a draft, check if it is too short or too long, and readjust. LLMs start writing, and keep writing, and if they approach what we tell it to be the limit, it does not readjust, but has to make a choice if the content it still wants to generate is worth more or less than that limit. At times, it wanted to keep talking.

We combated this by making our limits very present. If you had a character limit, add a word limit as well, perhaps a sentence limit. Something else we added was to have our limit in place in multiple ways. If we wanted 500 characters, we also added "five hundred" for good measure.

Given all of the above, we "tamed" our LLMs considerably. Of course, LLMs are unpredictable in their nature. That is what powers them. But with all of these tips, it gave us at least one hand on the wheel, steering our LLMs to a more manageable result.