Parsing ansi Code

Dealing with strings that contain ansi codes for colorizing can be a challenge especially if you need the actual display length of the embedded strings.

Here is a quick-n-dirty function that can either exctract all the ansi_codes and provide them to you or just retrun the embedded string(s) or just return the length of the line without the ansi_codes.

# ####################################
  def parse_codes(msg, rtrn_type='dict', *args, **kwargs):
      # ################################
      """
      purpose: grab all ansi_codes from a string and return them - see below
      requires:
          - msg: str      # line string that may or may not contain ansi_codes, codes are separated from the message string
      options:
          - rtrn: str     # default='dict'  a dictionary will be returned = {'codes': ['code1', 'code2', 'code3'...], 'strings': 'str1', 'str2', 'str3', 'str4',...]}
                          # rtrn='list' returns the list of codes
                          # rtrn='string'|'str' returns string(s) joined as one string without codes
                          # rtrn='nclen'|'len' returns only thr length of the (joined) string without and codes
      returns:
          - default = {'codes': ['code1', 'code2',...], 'strings': 'str1', 'str2', ...]}
          - rtrn='list' = ['code1', 'code2', ...]
          - rtrn='string' = "".joined(['str1', 'str2', ...])
          - rtrn='len' = len("".joined(['str1', 'str2', ...]))
      notes:
          - this replaces the decrecated split_codes(), escape_ansi(), and nclen()
          - WIP
          - 20250826-1606
      """
      # """--== Local Import(s) ==--""" #
      # """--== Config ==--""" #
      rtrn_type = arg_val(["rtrn", "type", "rtrntype","rtrn_type"], args, kwargs, dflt=rtrn_type)
      # """--== Init ==--""" #
      pat = r"\x1b\[[0-9;]+m"
      msg = str(msg).strip()
      # msg = clr_coded(msg) 
      rtrn = None
      codes_l = ["",""]
      # """--== Process ==--""" # 
      codes_l = re.findall(pat, msg):
      if not codes_l:
          codes_l = ["",""]
      elems_l = re.split(pat, msg)  # <--- fails but why?
      if len(elems_l) > 2:
          elems_l =  elems_l[1:-1]  # I do not know why but it always seems to have '' both as the first and last elem
      # """--== Prepare rtrn ==--""" # 
      if rtrn_type in ('dict', "d", "dictionary"):  # default
          rtrn = {'codes': codes_l, 'strings': elems_l}
      if rtrn_type in ('list', 'lst', 'l'):
          rtrn = codes_l
      if rtrn_type in ('string', 'str', 'joined', 'stripped', 'nocode', 'escaped'):
          rtrn = "".join(elems_l)
      if rtrn_type in ('nclen', 'len'):
          rtrn = len("".join(elems_l))
      # """--== return ==--""" #
      # print(f"{rtrn_type=} {rtrn=}")  # for debugging
      return rtrn

This call arg_val() which returns the value for any rtrn_tye=‘xxx’. How that function works is aluded to in previous posts but I might go back and revisit this matter in the future.

The line that calls clr_codes(msg) has been commented out here. In my use that line calls another function that translates embedded color tags into ansi codes. For example, a string like this “This [blue]is blue[/] and this [red]is red[/]” gets translated into ‘This \x1b[34mis blue\x1b[0m and this \x1b[31mis red\x1b[0m’. Perhaps in a future post I will show how clr_coded works but that will be a lengthy post.

Thanks for reading…

Contact me if you have any questions.

Enjoy!

Companionway

FEATURED TAGS