Prime powers

One of my favorite projects for AI large language models is to identify prime powers. I consistently don’t get the right answer on the first try.

Here is my latest request for OpenAI’s ChatGPT 4

what are the first 10 prime powers. a prime power is a prime number raised to an exponent of 2 or more. use an “is prime power” function

I added the suggestion to use an “is prime power” function because, in the past, queries without that suggestion generated obviously incorrect results. This time the error was more subtle. I didn’t notice it at first.

After that, I asked ChatGPT 4 to show me the factorization of the first 25 prime powers past 78120. (A few days ago, I had requested a list of the first 100 prime powers. 78125 is the 100th. I wanted to extend the list.)

I was surprised with this comment at the end of its answer.

It appears there was a mistake in interpreting some numbers as prime powers; specifically, those represented as \(x^2\) where \(x\) is not prime. This discrepancy arises from the function initially designed to identify prime powers without checking if the base is prime for squared values.

It had the insight to see a mistake without my pointing it out. When I asked to recreate the list, it updated the offending function and gave correct results.

This remark is an example of ChatGPT 4’s ability to understand what it is creating. It’s a higher order thinking process to create the code and then evaluate its results autonomously.

I’ve never trusted that current AI models would generate correct code without some coaching. I am more impressed with an ability to understand the results, see where those hadn’t matched my intent and correct the error.

I tried a similar experiment with ChatGPT 3.5. It gave correct code, but the output it presented from the execution didn’t match a run of that code with a real Python interpreter. By the time I realized that the problem was correct code/incorrect output, the chat had devolved into confused statements trying to explain what happened.

I have been trusting that, if a query develops Python code, there would be an actual Python engine to execute it. The situation is more complicated than this. I’m not sure how to probe version 4 to see whether its Python is running correctly.

The hallucinations of an LLM can be more subtle than I anticipated. Their successes can be subtle as well.

49 Issues and ChatGPT

The New Yorker publishes every Monday. However, they have 4 issues each year that are double issues, each covering 2 weeks.

The first issue for 2024 is a Jan. 1/Jan. 8. I wondered whether the publisher’s decision was 48 issues a year or 4 double issues each year. I thought it would be interesting to know which years they have a choice.

In other words, I wondered what years in the 21st century have 53 Mondays in instead of the regular 52. When I asked ChatGPT 4 for the answer, I was surprised at how it found its answer.

Rather than having specific date training that could be consulted, ChatGPT wrote some Python code. It then ran the code giving the answer of 2001, 2007, 2012, 2018, 2024, 2029, 2035, 2040, 2046, 2052, 2057, 2063, 2068, 2074, 2080, 2085, 2091, 2096. If you look at the list, the years are spaced out every 5 or 6 years.

The code was:

from datetime import datetime, timedelta

# Function to check if a year has 53 Mondays
def has_53_mondays(year):
    # January 1 of the year
    jan_1 = datetime(year, 1, 1)
    # Count the number of Mondays
    monday_count = 0
    for i in range(365 + (jan_1.year % 4 == 0 and jan_1.year % 100 != 0 or jan_1.year % 400 == 0)):
        if (jan_1 + timedelta(days=i)).weekday() == 0:
            monday_count += 1
    return monday_count == 53

# List of years in the 21st century with 53 Mondays
years_with_53_mondays = [year for year in range(2001, 2101) if has_53_mondays(year)]
years_with_53_mondays

It’s understandable with a little work and doesn’t make any sophisticated assumptions. To get the answer, it visits every day so that it takes about 36525 steps to get the 21st century result. (With a deeper understanding of the calendar, it wouldn’t need the inner loop). This code offered a solution that is verifiable. (I just did a little research and Python’s calendar library doesn’t account for 1752.)

I only checked a few of its answers but the code looks robust.

I was curious what else it could do. I asked it for the first prime power greater than 1000000. The tool wrote some code and gave what looked like a reasonable answer.

# To find the first prime power greater than 1,000,000, we start by finding prime numbers and then checking their powers
# until we find a value greater than 1,000,000.

from sympy import primerange, isprime

def first_prime_power_above(limit):
    for prime in primerange(2, limit):
        power = 2  # Start from the square of the prime
        while True:
            prime_power = prime ** power
            if prime_power > limit:
                return prime_power
            power += 1

# Setting a limit for the search range. The exact limit is a guess; we need a range large enough to find our number.
search_limit = 10000
first_prime_power_above(1000000)

The code has a problem because it will only return a power of two. It only exits the inner loop with “return” and thus only analyzes powers of 2.

Armed with that understanding, I saw that it would fail on 150, so when I asked, it gave me 256. I knew 169 was the right answer. When I pointed out that it was using a greedy algorithm that only returns a power of two. It generated new code, below. It uses Python sets to calculate the result.

# Revisiting the approach to correctly find the first prime power greater than 150, considering all primes and their powers.

def first_correct_prime_power_above_v2(limit):
    prime_powers = set()
    for prime in primerange(2, limit):
        power = 2
        while True:
            prime_power = prime ** power
            if prime_power > limit:
                prime_powers.add(prime_power)
                break
            power += 1
    return min(prime_powers)

first_correct_prime_power_above_v2(150)

I was surprised that ChatGPT would generate code to make its analysis in both cases. The team implementing the engine saw that Python works as a lingua franca to implement algorithms. Simple Python is popular and has clean syntax.

My experiment demoed a feature that I didn’t expect. I can validate the code. ChatGPT lets me look under the hood while the textual responses won’t allow. It also shows that even unambiguous mathematical questions can yield incorrect results.

Bard was not useful. On the date question, it gave a list of incorrect answers. Rather than having a verifiable algorithm, it hallucinates a rule to explain its answer. Often its verbal algorithm isn’t correct and may not even make sense. Nor does the answer even follow the heuristic. I pointed out one of the mistakes in its year list and it just came up with a different confused rule it didn’t follow and got an even worse answer. I’ve noticed that Bard is inflexible and will double down on wrong answers.

For the prime power, Bard gave nonsensical steps to explain what it calulated. And then didn’t use that procedure to find the answer.


The first prime power greater than 150 is 169.

Here's the reasoning:

1. Start with the closest prime number after 150, which is 151.

2. Check if 151 is a prime power: 151 is only divisible by 1 and itself, making it a prime number, but not a prime power (prime powers must be raised to an exponent greater than 1).

3. Move on to the next prime number, which is 157. Again, 157 is only divisible by 1 and itself, confirming it's prime.

4. However, 157 raised to any positive exponent will still be an odd number, so it can't be the first prime power greater than 150.

5. Continue this process with the next prime numbers: 163 and 167.

6. 163 is not a prime power (163 = 7 * 23).

7. Finally, 167 is both a prime number and a prime power: 167 = 7^2, fulfilling both criteria.

Therefore, 169 (7^2) is the first prime power greater than 150. 

Bard offered two alternate responses, but they are just as jumbled and confused as this one. Step 4 is irrelevant. Step 6 is worse because 163 is prime. I don’t know why it has 167 and 169 confused nor why it uses seven squared.

Rather than lean on the English language to answer a mathematical question, ChatGPT uses an expertise in programming to answer math adjacent questions. This experiment demonstrates that it has basic algorithmic reasoning capabilities but is not consistently correct.

Going back to the original question, when I looked at a blow-in ad card, it said there are 4 double issues, so this year, I expect 49 issues with 4 of those being double issues.

GitHub Copilot Honor Pledge

I’ve been using GitHub Copilot recently with Microsoft Visual Studio while programming in C++. GitHub Copilot is an AI based software development assistant.

The initial observation is that it can be helpful. However, I found that it can violate the privacy of users whose code has been ingested by the language model.

In some big successes, it used comments as prompts to generate Win32 code. Once, the generated code revealed the exact function I needed. I didn’t know what to search to find the API that identifies the edge the taskbar is anchored. The code wasn’t exactly what I needed, but the results showed me where to look in Microsoft’s Win32 documentation.

When I was exploring a neural network project, it filled in a long list of varied neural network classes. As I customized my code, later code replicated the altered class outline. When it generates correct code after you set up the types and variable names, it can be fun.

My first critique is that it is sluggish. The time between when I am presented with a suggestion and Visual Studio registering my acceptance of it can be slow. Occasionally it can be insistent on making a change when I want to choose other code. Perhaps my PC needs an upgrade?

It is able to write descriptive comments about code. However, I have found that the description can be incorrect. It can write clear English, but what it says might not reflect an accurate understanding of the code. It can comment on the purpose of code based on insufficient evidence and can misinterpret a declaration and synthesize incorrect text. This requires a lot of vigilance. For example, I noticed, days later, that some of the text was superficially correct, but when I read it carefully, I saw that it was wrong. When Copilot is generating text, even if it isn’t correct, it fits the context. That makes it easier for errors to slip through.

A concern with Copilot is that it uses open-source content without honoring that code’s specific licenses. GitHub provides a website GitHub Copilot Trust Center with documentation of some of their policies. There’s a lot there to analyze. GitHub Copilot, is like ChatGPT and some of the image generating applications: they are using copyright law in ways that haven’t been clarified by legislation and litigation.

Once, as I was filling in the top of a file, Copilot dreamed up some unfamiliar code. When I went looking for it on GitHub, I couldn’t find the specific code but identified that it was derived from a project RED4ext that was unrelated to my application.

It was interesting that Bing and DuckDuckGo both give lengthy results when I search with the qualifier site:github.com. In contrast, Google gave minimal results for such a search; apparently it doesn’t index project text.

The “Github Copilot and Copyright” section of the Trust Center claims that their option “GitHub Copilot can allow or block suggestions matching public code” can prevent Copilot from using segments of about 150 characters or more from GitHub. However, usually, Copilot generates code a line at a time which will almost always be less than 150 characters. That claim seems to be a very weak promise.

I was starting a file and noticed a name in the header comments. I was intrigued and started auto-generating title comments for files with minimal code.

My first test alarmed me with this fully ironic text. The same pledge appeared several times in my explorations. The samples had different usernames.

// Honor Pledge:    
//     
// I pledge that I have neither given nor
// received any help on this assignment.
//  
// <<username elided>>

In a file that I had fully written, the header comment started with

// Created by: <<username elided>>
//  Created on: 7/10/2020 10:00:00 PM

Continuing my exploration of what it might generate, I got the full name of a user!

// Created by J*** K*** on 11/2/16.

These privacy violations make me interested in investigating GitHub Copilot’s promotional claims in more depth. As I complete more research, I should re-evaluate my use of the service.

GPT + Microsoft, Bard/Google and Beyond

The public conversation about AI tools is stuck focusing on ChatGPT, ignoring older uses of AI and other tools. Although it’s easy to use and flashy, but there is more to AI than a chat engine.

Microsoft has a couple of ways of interacting with a GPT derived chat tool. In the title bar of their Edge browser, there’s a prominent ‘b’ logo that opens a side panel to start a conversation. One enhancement to ChatGPT is that the Edge browser will give links referring to information sources. A conversation mode is also directly integrated with bing.com search results.

Google has an experimental chat service, Bard, at https://bard.google.com. My limited experience with Bard has been unsatisfactory so I have only used it a few times.

I have a specific use case with the interactive engines: helping me with programming language syntax and techniques. I’m not using it for the high-powered manner app-generating miracles that I’ve seen described in the media. Instead, I’m using it to supplement conventional help resources. As I learn more, I can ask more complex questions and develop increasingly useful skills.

Bard answers my questions in a very stilted manner. When I ask a programming question, the code it generates can be stand-alone. The code includes fluff such as verbiage allowing me to copy and paste the suggested code and directly execute it. That is frustrating because my goal is to learn how a feature works, not generate sample code. Bard is evolving, but what I’ve seen so far isn’t compelling me to use it. It prefers to give a specific, narrowly focused answer rather than explain a concept. I doesn’t know the context of my questions and give an example with minimal insight.

Bing’s search tool is much more useful to me. It remembers the context of my current questions. I don’t have to tell it “Python” or “JavaScript” every time I’m asking a new question. It presents example code that is relatively terse and succinct, helping me not get bogged down by unrelated details. I don’t expect the code to be stand-alone because I’m not looking for it to write code for me.

However, sometimes, a web search is more effective than using chat features. There are a few specialized sites such as https://stackoverflow.com that can answer questions. A search on Google, DuckDuckGo or Bing can go off base and include unneeded results, especially when the correct technical term has other generic uses. In one pleasing interaction, Bing’s top-line short answer was unrelated, but when I opened the chat, the chat answered the real software issue I was trying to understand.

It seems that ChatGPT is “sucking all of the Oxygen out of the room.” All of the news or blog commentaries talk about its threats and promise. They forget that AI has more uses beyond general-purpose conversational tools.

I have been using less prominent and limited AI for quite a while. Edge and Microsoft 365 (Office) both have been giving me suggestions as auto-complete so that I can accept with a tab press. It is not flashy, but it can save time. The keyboard interface to my iPhone’s messaging app also tries to predict what I intended. They are using Artificial Intelligence algorithms for that service. It is helpful.

My realization is that that AI is not a new tool. Amazon and others use it to identify potential sales based on its analysis of customer search and sales history. This Big Data application of artificial intelligence is old enough that it’s invisible now. It’s just called “the algorithm” and it is so ubiquitous that it’s often mundane.

If you don’t use the Edge browser and Bing search engine, you’re not going to see these additional ways of using the GPT Engine. I find them very productive. AI has a public face in ChatGPT, but there are other ways AI technology is common.