πŸ‘©πŸ»β€πŸ’» Perfect Programming Uses for ChatGPT: Weird RegEx One-offs

December 17, 2024  

I will be the first to say that you shouldn’t over rely on tools like ChatGPT, particularly for programming. I could write a whole post about that. Today, I want to focus on where I think it excels incredibly in many research and tinker contexts: strange one-off programming tasks.

The task I had to complete: converting Unicode characters within multiple key-value mappings into their Unicode hex representations. This isn’t really a normal sort of task where it makes sense to optimize.

The Example

The problem involved input data like this. The aim was to quickly be able to type in a 𝕔𝕠𝕠𝕝 π•—π• π•Ÿπ•₯ π•π•šπ•œπ•– π•₯π•™π•šπ•€ π•šπ•Ÿ 𝕒𝕦π•₯𝕠𝕙𝕠π•₯π•œπ•–π•ͺ π•¨π•šπ•₯𝕙 𝕒 π•’π•¦π•šπ•”π•œπ•€π•¨π•šπ•₯𝕔𝕙𝕖𝕣. However, I realised that autohotkey couldn’t read the unicode literals, and I had to convert them to their hexadecimal representation.

monospaceCharMap := {"0":"𝟢","1":"𝟷","a":"𝚊","b":"πš‹"} 
boldCharMap := {"0":"𝟎","1":"𝟏","a":"𝐚","b":"𝐛"}`
monospaceCharMap := {"0":"𝟢","1":"𝟷","a":"𝚊","b":"πš‹"}
monospaceCharMap := {"0":"U+1D7CE","1":"U+1D7CF","a":"U+1D68A","b":"U+1D68B"}

⚠️ It is important to note that my aim was specifically to have the output be literally what you see above, so I could copy and paste that back into my autohotkey script.

How ChatGPT Sorted It

In reality, I had 10 dictionary-like maps. It would have been a pain to manually do this. Even just thinking about the regex was giving me a headache for something that was meant to be for fun.

ChatGPT wrote up a Python script that:

  1. Parses multiple key-value mappings using regular expressions.
  2. Converts single-character values to their Unicode hex format using the ord() function and 04X formatting.
  3. Reconstructs the updated mappings and outputs them to a new file.

Here’s the complete script:

import re

# Read the input file
with open("text.txt", "r", encoding="utf-8") as file:
    content = file.read()

# Regular expression to match each map and its key-value pairs
map_pattern = r'([a-zA-Z]+CharMap) := \{([^}]*)\}'
key_value_pattern = r'"([^"]+)":"([^"]+)"'

# Process each map
updated_maps = []
for map_match in re.finditer(map_pattern, content):
    map_name = map_match.group(1)
    key_value_string = map_match.group(2)
    
    # Process the key-value pairs
    updated_pairs = [
        f'"{key}":"U+{ord(value):04X}"' if len(value) == 1 else f'"{key}":"{value}"'
        for key, value in re.findall(key_value_pattern, key_value_string)
    ]
    
    # Reconstruct the map
    updated_map = f'{map_name} := {{{",".join(updated_pairs)}}}'
    updated_maps.append(updated_map)

# Combine all updated maps into the final result
updated_content = "\n".join(updated_maps)

# Save the updated content to a file
with open("output.txt", "w", encoding="utf-8") as outfile:
    outfile.write(updated_content)

print("Conversion complete! Output saved to 'output.txt'.")

Code Breakdown

  1. Regex Matching:

    • map_pattern identifies each map and its key-value pairs.
    • key_value_pattern extracts the individual "key":"value" pairs.
  2. Unicode Conversion:

    • ord(value) gets the Unicode code point for a character.
    • f"{ord(value):04X}" formats the code point as a 4-character uppercase hexadecimal (e.g., U+1D7CE).
    • if len(value) == 1 ensures that only symbols (single characters) are processed, not plain text values.
  3. Reconstruction:

    • The updated key-value pairs are joined using ",".join() to rebuild the map.
    • Each updated map is stored in a list and written to the output file.

What Features Made this a Good Task

Conclusion

Overall, this was a pretty quick and clean solution to an annoying task. ChatGPT even helped prime a few sections of this blog (I take full credit that I’m sitting here writing in my voice though). When I have an extremely discrete regex task, ChatGPT is godlike.

To anybody interested I will make a post describing the AutoHotkey script that converts keyboard input into specialized character maps such as monospace, bold italic, bold sans, cursive, double-struck, medieval, and italic. It supports both uppercase and lowercase characters by detecting the Shift.