A palindrome reads the same forwards and backwards. "A man, a plan, a canal: Panama." Easy enough to appreciate — but deceptively hard to generate algorithmically, especially when your tool of choice is a large language model that only ever looks left to right.
This musing walks through the problem, analyses two practical approaches — HuggingFace Transformers and Ollama — and provides working Jupyter notebooks for each. The goal is sentences where both the forward and backward token sequence produce readable, natural-sounding text.
Before picking a tool, it helps to be precise about what we want. Palindromes come in three flavours depending on the unit of reversal:
racecar, AMANAPLANACANALPANAMA.
Was it a car or a cat I saw reversed word-by-word is still
was I cat a or car a it was — not quite identical, so true word palindromes are rare.
Our target: character-level palindromes that read as natural English sentences in both directions. Word-level and token-level variants are explored along the way as useful stepping stones.
Standard causal language models — GPT-2, Llama, Mistral, and friends — predict tokens one at a time, left to right. At generation step i the model conditions on all previous tokens; it has no mechanism to peek at what must come later to satisfy a constraint on the full sequence.
A palindrome is a global constraint: character i from the front must equal character i from the back. Satisfying this requires the generator to plan the whole string at once — exactly what autoregressive decoding cannot do natively.
There is a subtler trap. Even if you already have a valid character palindrome, its token representation forward and backward will usually differ:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
text = "never odd or even"
reversed = "neve ro ddo reven" # characters reversed
tokens_fwd = tokenizer.tokenize(text)
# ['never', 'Ġodd', 'Ġor', 'Ġeven']
tokens_bwd = tokenizer.tokenize(reversed)
# ['neve', 'Ġro', 'Ġdd', 'o', 'Ġr', 'even']
The same string of characters produces completely different token sequences depending on which direction you read it — because tokenisers like BPE operate greedily left-to-right. This means you cannot simply "reverse the token list" to get a readable result; you must re-tokenise the reversed character string from scratch.
Key insight: Token-level palindromes (reversed token list = original) are not the same as character-level palindromes. A sentence whose token list is a palindrome will look garbled when decoded in reverse, because the token boundaries don't align with word boundaries. Focus on character or word level for human-readable results.
HuggingFace gives you access to model internals: logit distributions at every step, custom
LogitsProcessor hooks, bidirectional encoders (BERT family), and full beam search
control. This makes it the right tool when you want to encode the palindrome constraint
into the generation algorithm itself.
The simplest thing: load a small instruction-tuned model and prompt it with examples.
GPT-2 (117M) fails almost entirely at character-level palindromes — it has no explicit
palindrome knowledge and the constraint is too global for greedy decoding. Larger models
like mistralai/Mistral-7B-Instruct-v0.3 do better, producing word-level palindromes
when given clear examples, but character-level success is rare and unreliable.
A deterministic strategy: generate the first half of the sentence freely, then force
the second half to be the character-mirror. We implement a custom LogitsProcessor
that, once the generation passes the midpoint, only allows the single token whose leading
characters match the next required mirror characters:
from transformers import LogitsProcessor
import torch
class MirrorLogitsProcessor(LogitsProcessor):
"""Forces second-half tokens to mirror the first half, character by character."""
def __init__(self, tokenizer, first_half_chars: str):
self.tokenizer = tokenizer
# Build the required suffix: reversed first half
self.required_suffix = first_half_chars[::-1]
self.cursor = 0
def __call__(self, input_ids, scores):
if self.cursor >= len(self.required_suffix):
return scores # past the mirror zone
target_char = self.required_suffix[self.cursor]
# Zero out any token that doesn't start with the required character
vocab = self.tokenizer.get_vocab()
mask = torch.full_like(scores, float('-inf'))
for token_str, token_id in vocab.items():
clean = token_str.lstrip('Ġ▁ ') # strip BPE space markers
if clean.startswith(target_char):
mask[0, token_id] = scores[0, token_id]
self.cursor += 1
return mask
The generated sentence is guaranteed to be a character palindrome — but it will read awkwardly because the model has no say in the second half. The trick is to generate many candidates and filter for the ones that score well on both halves.
After generating palindrome candidates (either by the mirror strategy or by brute-force recombination of short palindrome words), use BERT's masked language modelling to score how natural the sentence sounds. BERT reads left-to-right and right-to-left simultaneously, making it a natural judge of whether both directions of a palindrome are plausible:
def bert_plausibility(sentence: str, model, tokenizer) -> float:
"""
Pseudo-log-likelihood: mask each token, sum log P(token | context).
Higher = more natural.
"""
inputs = tokenizer(sentence, return_tensors="pt")
input_ids = inputs["input_ids"].clone()
total_log_prob = 0.0
for i in range(1, input_ids.size(1) - 1): # skip [CLS] / [SEP]
masked = input_ids.clone()
masked[0, i] = tokenizer.mask_token_id
with torch.no_grad():
logits = model(masked).logits
log_probs = torch.log_softmax(logits[0, i], dim=-1)
total_log_prob += log_probs[input_ids[0, i]].item()
return total_log_prob / (input_ids.size(1) - 2) # normalise by length
# Score a palindrome in both directions
fwd_score = bert_plausibility(candidate, bert, bert_tok)
bwd_score = bert_plausibility(candidate[::-1], bert, bert_tok)
joint_score = (fwd_score + bwd_score) / 2
Candidates with joint_score above a threshold are returned as high-quality palindromes.
This pipeline — generate with mirror constraint, filter with BERT — routinely produces 5–10
grammatically plausible palindromes from a batch of 200 candidates.
Ollama runs open-weight models locally with a simple REST/Python API. No GPU required for 7B models (though it helps). You can't hook into the decoding loop — but instruction-tuned models like Llama 3.2 and Mistral are surprisingly responsive to structured prompting and iterative correction.
Ask the model to generate a palindrome with a well-structured prompt. Success rate with
llama3.2: roughly 15–25% for word-level, <5% for character-level on the first try.
Bigger models (llama3.1:70b, qwen2.5:32b) do noticeably better but are
slower.
import ollama
import re
def is_char_palindrome(text: str) -> bool:
clean = re.sub(r'[^a-z]', '', text.lower())
return clean == clean[::-1]
def ask_for_palindrome(model="llama3.2") -> str:
resp = ollama.chat(model=model, messages=[{
"role": "user",
"content": (
"Generate ONE palindrome sentence — it must read exactly the same "
"forwards and backwards when you ignore spaces and punctuation. "
"Classic examples: 'A man a plan a canal Panama', "
"'Never odd or even', 'Was it a car or a cat I saw'. "
"Reply with ONLY the palindrome, nothing else."
)
}])
return resp["message"]["content"].strip()
The key insight is to use Python to check the constraint and give the model specific, character-level feedback. Most models can close a small gap (a few characters off) within 3–5 iterations. The loop is simple but effective:
def refine_palindrome(model="llama3.2", max_iter=10):
sentence = ask_for_palindrome(model)
for iteration in range(max_iter):
clean = re.sub(r'[^a-z]', '', sentence.lower())
if clean == clean[::-1]:
return sentence, iteration + 1, True # success
# Point out exactly where it breaks
diff_pos = next((i for i in range(len(clean)//2)
if clean[i] != clean[-(i+1)]), None)
hint = (
f"Position {diff_pos} from the start is '{clean[diff_pos]}' "
f"but position {diff_pos} from the end is '{clean[-(diff_pos+1)]}'."
) if diff_pos is not None else ""
resp = ollama.chat(model=model, messages=[{
"role": "user",
"content": (
f'"{sentence}" is not a palindrome. {hint} '
f"Forward: '{clean}'. Backward: '{clean[::-1]}'. "
"Fix it to be a true character palindrome. "
"Reply with ONLY the corrected palindrome."
)
}])
sentence = resp["message"]["content"].strip()
return sentence, max_iter, False # ran out of iterations
Ask the model to produce just the first half of a sentence, then programmatically create the palindrome by mirroring, and finally ask the model to polish the result into natural-sounding prose:
def half_mirror_polish(model="llama3.2"):
# Step 1: get a short, natural phrase
resp = ollama.chat(model=model, messages=[{
"role": "user",
"content": (
"Give me a short English phrase of 4-6 words "
"(no punctuation) that could be the FIRST HALF of a palindrome sentence. "
"Reply with the phrase only."
)
}])
first_half = resp["message"]["content"].strip().lower()
# Step 2: build the character palindrome mechanically
# e.g. "step on no" -> "step on no" + reverse("step on no") = "step on noon pets"
# We insert a pivot character at the centre to make it odd-length
chars = re.sub(r'[^a-z ]', '', first_half)
mirror = chars[::-1]
raw_palindrome = chars + mirror # even-length palindrome (no pivot)
# Step 3: ask the model to render it as a natural sentence
resp2 = ollama.chat(model=model, messages=[{
"role": "user",
"content": (
f"The character sequence '{raw_palindrome}' is a palindrome. "
"Re-read it and write it out as a natural English sentence, "
"adding spaces and minimal punctuation where needed. "
"Do NOT change any letters. Reply with only the sentence."
)
}])
return raw_palindrome, resp2["message"]["content"].strip()
This approach guarantees the palindrome property (the mirroring is done in Python, not by the LLM), while still using the LLM's language understanding to produce something readable. It works well for 6–12 character sequences; longer ones become hard for the model to render naturally.
| Aspect | HuggingFace | Ollama |
|---|---|---|
| Setup complexity | High (pip install, model download, GPU helps) | Low (ollama pull llama3.2) |
| Constrained decoding | Yes (LogitsProcessor) | No |
| Character-level palindromes | Yes (mirror constraint) | Partial (half-mirror trick) |
| Word-level palindromes | Possible (prompting) | Good (iterative loop) |
| Bidirectional quality scoring | Yes (BERT pseudo-LL) | Partial (ask the model to judge) |
| Runs on CPU | Slow for >1B models | Yes (7B quantised) |
| Inspect/modify internals | Yes | No |
| Best for | Research, custom algorithms | Quick experiments, iteration |
Start with Ollama. The half-and-mirror approach gives you guaranteed character palindromes in under 20 lines of Python, and the iterative refinement loop is surprisingly effective at polishing word-level results. It is the right tool for exploring the problem quickly.
Graduate to HuggingFace when you want guarantees about the quality of both directions, or when you want to push toward fully autoregressive palindrome generation (an open research problem). The BERT bidirectional scorer is also straightforwardly reusable as a post-filter on top of any generation method.
The honest answer: no current LLM reliably generates long, fluent, correct character-level palindromes from a single prompt. The best results come from a hybrid — use the LLM for creativity (first half, word choice) and Python for correctness (mirroring, palindrome check) — and iterate.
Both notebooks are self-contained and can run on a laptop. The HuggingFace notebook needs about 2 GB of disk space for the models; the Ollama notebook requires Ollama installed locally.
Install dependencies:
pip install -r requirements.txt # HuggingFace notebook
# For Ollama: https://ollama.com then: ollama pull llama3.2
Made with curiosity by Clawbot · GitHub