Prime powers

One of my favorite projects for AI large language models is to identify prime powers. I consistently don’t get the right answer on the first try.

Here is my latest request for OpenAI’s ChatGPT 4

what are the first 10 prime powers. a prime power is a prime number raised to an exponent of 2 or more. use an “is prime power” function

I added the suggestion to use an “is prime power” function because, in the past, queries without that suggestion generated obviously incorrect results. This time the error was more subtle. I didn’t notice it at first.

After that, I asked ChatGPT 4 to show me the factorization of the first 25 prime powers past 78120. (A few days ago, I had requested a list of the first 100 prime powers. 78125 is the 100th. I wanted to extend the list.)

I was surprised with this comment at the end of its answer.

It appears there was a mistake in interpreting some numbers as prime powers; specifically, those represented as \(x^2\) where \(x\) is not prime. This discrepancy arises from the function initially designed to identify prime powers without checking if the base is prime for squared values.

It had the insight to see a mistake without my pointing it out. When I asked to recreate the list, it updated the offending function and gave correct results.

This remark is an example of ChatGPT 4’s ability to understand what it is creating. It’s a higher order thinking process to create the code and then evaluate its results autonomously.

I’ve never trusted that current AI models would generate correct code without some coaching. I am more impressed with an ability to understand the results, see where those hadn’t matched my intent and correct the error.

I tried a similar experiment with ChatGPT 3.5. It gave correct code, but the output it presented from the execution didn’t match a run of that code with a real Python interpreter. By the time I realized that the problem was correct code/incorrect output, the chat had devolved into confused statements trying to explain what happened.

I have been trusting that, if a query develops Python code, there would be an actual Python engine to execute it. The situation is more complicated than this. I’m not sure how to probe version 4 to see whether its Python is running correctly.

The hallucinations of an LLM can be more subtle than I anticipated. Their successes can be subtle as well.

49 Issues and ChatGPT

The New Yorker publishes every Monday. However, they have 4 issues each year that are double issues, each covering 2 weeks.

The first issue for 2024 is a Jan. 1/Jan. 8. I wondered whether the publisher’s decision was 48 issues a year or 4 double issues each year. I thought it would be interesting to know which years they have a choice.

In other words, I wondered what years in the 21st century have 53 Mondays in instead of the regular 52. When I asked ChatGPT 4 for the answer, I was surprised at how it found its answer.

Rather than having specific date training that could be consulted, ChatGPT wrote some Python code. It then ran the code giving the answer of 2001, 2007, 2012, 2018, 2024, 2029, 2035, 2040, 2046, 2052, 2057, 2063, 2068, 2074, 2080, 2085, 2091, 2096. If you look at the list, the years are spaced out every 5 or 6 years.

The code was:

from datetime import datetime, timedelta

# Function to check if a year has 53 Mondays
def has_53_mondays(year):
    # January 1 of the year
    jan_1 = datetime(year, 1, 1)
    # Count the number of Mondays
    monday_count = 0
    for i in range(365 + (jan_1.year % 4 == 0 and jan_1.year % 100 != 0 or jan_1.year % 400 == 0)):
        if (jan_1 + timedelta(days=i)).weekday() == 0:
            monday_count += 1
    return monday_count == 53

# List of years in the 21st century with 53 Mondays
years_with_53_mondays = [year for year in range(2001, 2101) if has_53_mondays(year)]
years_with_53_mondays

It’s understandable with a little work and doesn’t make any sophisticated assumptions. To get the answer, it visits every day so that it takes about 36525 steps to get the 21st century result. (With a deeper understanding of the calendar, it wouldn’t need the inner loop). This code offered a solution that is verifiable. (I just did a little research and Python’s calendar library doesn’t account for 1752.)

I only checked a few of its answers but the code looks robust.

I was curious what else it could do. I asked it for the first prime power greater than 1000000. The tool wrote some code and gave what looked like a reasonable answer.

# To find the first prime power greater than 1,000,000, we start by finding prime numbers and then checking their powers
# until we find a value greater than 1,000,000.

from sympy import primerange, isprime

def first_prime_power_above(limit):
    for prime in primerange(2, limit):
        power = 2  # Start from the square of the prime
        while True:
            prime_power = prime ** power
            if prime_power > limit:
                return prime_power
            power += 1

# Setting a limit for the search range. The exact limit is a guess; we need a range large enough to find our number.
search_limit = 10000
first_prime_power_above(1000000)

The code has a problem because it will only return a power of two. It only exits the inner loop with “return” and thus only analyzes powers of 2.

Armed with that understanding, I saw that it would fail on 150, so when I asked, it gave me 256. I knew 169 was the right answer. When I pointed out that it was using a greedy algorithm that only returns a power of two. It generated new code, below. It uses Python sets to calculate the result.

# Revisiting the approach to correctly find the first prime power greater than 150, considering all primes and their powers.

def first_correct_prime_power_above_v2(limit):
    prime_powers = set()
    for prime in primerange(2, limit):
        power = 2
        while True:
            prime_power = prime ** power
            if prime_power > limit:
                prime_powers.add(prime_power)
                break
            power += 1
    return min(prime_powers)

first_correct_prime_power_above_v2(150)

I was surprised that ChatGPT would generate code to make its analysis in both cases. The team implementing the engine saw that Python works as a lingua franca to implement algorithms. Simple Python is popular and has clean syntax.

My experiment demoed a feature that I didn’t expect. I can validate the code. ChatGPT lets me look under the hood while the textual responses won’t allow. It also shows that even unambiguous mathematical questions can yield incorrect results.

Bard was not useful. On the date question, it gave a list of incorrect answers. Rather than having a verifiable algorithm, it hallucinates a rule to explain its answer. Often its verbal algorithm isn’t correct and may not even make sense. Nor does the answer even follow the heuristic. I pointed out one of the mistakes in its year list and it just came up with a different confused rule it didn’t follow and got an even worse answer. I’ve noticed that Bard is inflexible and will double down on wrong answers.

For the prime power, Bard gave nonsensical steps to explain what it calulated. And then didn’t use that procedure to find the answer.


The first prime power greater than 150 is 169.

Here's the reasoning:

1. Start with the closest prime number after 150, which is 151.

2. Check if 151 is a prime power: 151 is only divisible by 1 and itself, making it a prime number, but not a prime power (prime powers must be raised to an exponent greater than 1).

3. Move on to the next prime number, which is 157. Again, 157 is only divisible by 1 and itself, confirming it's prime.

4. However, 157 raised to any positive exponent will still be an odd number, so it can't be the first prime power greater than 150.

5. Continue this process with the next prime numbers: 163 and 167.

6. 163 is not a prime power (163 = 7 * 23).

7. Finally, 167 is both a prime number and a prime power: 167 = 7^2, fulfilling both criteria.

Therefore, 169 (7^2) is the first prime power greater than 150. 

Bard offered two alternate responses, but they are just as jumbled and confused as this one. Step 4 is irrelevant. Step 6 is worse because 163 is prime. I don’t know why it has 167 and 169 confused nor why it uses seven squared.

Rather than lean on the English language to answer a mathematical question, ChatGPT uses an expertise in programming to answer math adjacent questions. This experiment demonstrates that it has basic algorithmic reasoning capabilities but is not consistently correct.

Going back to the original question, when I looked at a blow-in ad card, it said there are 4 double issues, so this year, I expect 49 issues with 4 of those being double issues.