libb.smart_base64

smart_base64(encoded_words)[source]

Decode base64 encoded words with intelligent charset handling.

Splits out encoded words per RFC 2047, Section 2 and handles common encoding issues like multiline subjects and charset mismatches.

Parameters:

encoded_words (str) – Base64 encoded string or plain text.

Returns:

Decoded string (or original if not encoded).

Return type:

str

Basic Usage:

>>> smart_base64('=?utf-8?B?U1RaOiBGNFExNSBwcmV2aWV3IOKAkyBUaGUgc3RhcnQgb2YgdGh'
...              'lIGNhc2ggcmV0dXJuIHN0b3J5PyBQYXRoIHRvICQyMDAgc3RvY2sgcHJpY2U/?=')
'STZ: F4Q15 preview – The start of the cash return story? Path to $200 stock price?'

Multiline Subjects (common email bug - base64 encoded per line):

>>> smart_base64('=?UTF-8?B?JDEwTU0rIENJVCBHUk9VUCBUUkFERVMgLSBDSVQgNScyMiAxMDLi'
...              'hZ0tMTAz4oWbICBNSw==?=\r\n\t=?UTF-8?B?VA==?=')
"$10MM+ CIT GROUP TRADES - CIT 5'22 102.625-103.125 MK T"

Charset Mismatch (UTF-8 header with Latin-1 content):

>>> smart_base64('=?UTF-8?B?TVMgZW5lcmd5OiByaWcgMTdzIDkxwr4vOTLihZsgMThzIDkzwr4v'
...              'OTTihZsgMjBzIDgywg==?=\r\n\t=?UTF-8?B?vS84Mw==?=')
'MS energy: rig 17s 91.75/92.125 18s 93.75/94.125 20s 82.5/83'

Unicode Characters:

>>> smart_base64('=?UTF-8?B?VGhpcyBpcyBhIGhvcnNleTog8J+Qjg==?=')
'This is a horsey: \U0001f40e'
>>> smart_base64('=?UTF-8?B?U0xBQiAxIOKFnDogIDEwOSAtIMK9IHYgNzYuMjU=?=')
'SLAB 1.375: 109 - 0.5 v 76.25'

Plain Text Passthrough:

>>> smart_base64('This is plain text')
'This is plain text'