MIME bodies disguised as attachments
My mail reader was showing "(no text body)" for a chunk of messages,
including basically everything from iCloud. The body was right there
in the raw .eml. The parser just refused to use it.
What was happening
The parser logic was the standard "for each MIME part, if it has a filename, it's an attachment; otherwise it might be a body". Reasonable in 2003. Wrong in 2026.
iCloud — and apparently some other senders — attach
filename="unnamed" to the text/plain and text/html parts. The
parts are still inline body content, but my "has a filename, must be
an attachment" check pulled them out of the body candidates and stuck
them on the attachments list as zero-useful files.
Net effect: hundreds of messages stored with body_text and
body_html empty, and a phantom unnamed attachment in the metadata.
The fix
text/plain and text/html are body content unless Content-
Disposition explicitly says attachment. The filename is a hint, not
a verdict.
def is_body_part(part):
content_type = part.get_content_type()
if content_type not in ("text/plain", "text/html"):
return False
cd = (part.get("Content-Disposition") or "").lower()
# Explicit attachment? Then no, it's an attachment.
if cd.startswith("attachment"):
return False
# Otherwise it's body, regardless of whether it has a filename.
return True
Deployed in two places:
- The live Lambda, so new mail gets parsed correctly going forward.
- A backfill Lambda that scanned DynamoDB for any row with empty
body text plus a
text/*"attachment", pulled the raw.emlout of S3, re-parsed it, and patched the row in place — including regenerating the snippet field.
What I'd do differently
I had carried this parser logic over from an older script of mine where it had been "working" for years. It had really been silently losing bodies on a class of senders I just didn't have in the corpus at the time.
Lesson: when you write code that handles a fuzzy spec (and MIME is the canonical example of a fuzzy spec), build a corpus of weird-but-real samples and run them through your parser regularly. iCloud, Outlook, Apple Mail, Postfix-from-cron, mailing-list footers — they all do slightly different things. A parser that handles "the four senders I already get mail from" is not a parser, it's a coincidence.