Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial auto-sync LoongArch support #2349

Open
wants to merge 9 commits into
base: next
Choose a base branch
from

Conversation

jiegec
Copy link
Contributor

@jiegec jiegec commented May 3, 2024

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

Add loongarch support to auto-sync. Co-authored by @FurryAcetylCoA.

See capstone-engine/llvm-capstone#47 for llvm changes. The generated code have small changes after generated via:

./src/autosync/ASUpdater.py -a LoongArch -s IncGen Translate
  • MC Tests are generated from llvm
  • Instruction groups are implemented
  • Register accesses are implemented
  • Memory operands are handled for memory instructions
  • Code are formatted using clang-format of LLVM 17

Test plan

...

Closing issues

...

@jiegec jiegec marked this pull request as draft May 3, 2024 07:36
@jiegec
Copy link
Contributor Author

jiegec commented May 3, 2024

Not finished yet, but hope to see some suggestions from @Rot127. The pr can already disassemble some instructions:

$ test_loongarch
****************
Platform: loongarch32
Code:0x0c 0x00 0x08 0x14 0x8c 0xfd 0xbf 0x02
Disasm:
0x1000: lu12i.w $t0, 0x4000
        op_count: 2
                operands[0].type: REG = t0
                operands[1].type: IMM = 0x4000

0x1004: addi.w  $t0, $t0, -1
        op_count: 3
                operands[0].type: REG = t0
                operands[1].type: REG = t0
                operands[2].type: IMM = 0xffffffffffffffff

0x1008:

****************
Platform: loongarch64
Code:0x80 0x80 0x00 0x40 0x63 0x80 0xff 0x02 0x78 0x20 0xc0 0x29 0x00 0x84 0x00 0x01 0x00 0xa4 0x14 0x01
Disasm:
0x1000: beqz    $a0, 0x80
        op_count: 2
                operands[0].type: REG = a0
                operands[1].type: IMM = 0x80

0x1004: addi.d  $sp, $sp, -0x20
        op_count: 3
                operands[0].type: REG = sp
                operands[1].type: REG = sp
                operands[2].type: IMM = 0xffffffffffffffe0

0x1008: st.d    $s1, $sp, 8
        op_count: 3
                operands[0].type: REG = (null)
                operands[1].type: REG = sp
                operands[2].type: IMM = 0x8

0x100c: fadd.s  $fa0, $fa0, $fa1
        op_count: 3
                operands[0].type: REG = fa0
                operands[1].type: REG = fa0
                operands[2].type: REG = fa1

0x1010: movgr2fr.w      $fa0, $zero
        op_count: 2
                operands[0].type: REG = fa0
                operands[1].type: REG = zero

0x1014:

The memory operand need to be handled, and more tests are required.

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job! Thanks a lot to you and @FurryAcetylCoA.

An advice if you are uncertain how to do some of the things below. You can look at the ARM or PPC module. Don't check AArch64 (too complex), Alpha, TriCore (does not always follow the auto-sync design for good reasons) or the other modules (they are heavily outdated).

Now, the biggest things missing are:

  • The details should be filled in by an add_cs_detail() function in LoongArchMapping.c (see comments).
    • The access information is still missing. Again, you can take a look at the ARM or PPC module, how it is done there.
    • Instruction groups are still missing as well if I didn't overlooked it.
  • The generated enums in the header file (register and instructions) should not be copied by hand, but via the HeaderPatcher.py. Just place the comments into the header file and run the script on it.
  • Delete unused inc files.
  • Regarding testing:
    • Disassembler tests are taken from llvm/test/MC/ and llvm/test/MC/Disassembler/, mostly copied by hand into the capstone repo (suite/MC/). I write a script currently to extract them automatically. I'll notify you when it's done.
    • Testing the details (luckily not difficult for LoongArch), can be done in your case by adding test cases to suite/cstest/issues.cs. Just ensure that every possible execution path for register, immediate and memory operand details are taken.

Once these things are done we are almost there. But we should do a second round of review then.

suite/auto-sync/src/autosync/cpptranslator/Differ.py Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchDisassembler.c Outdated Show resolved Hide resolved
arch/LoongArch/LoongArchGenCSFeatureEnum.inc Outdated Show resolved Hide resolved
suite/auto-sync/src/autosync/cpptranslator/patches/Size.py Outdated Show resolved Hide resolved
utils.h Outdated Show resolved Hide resolved
@jiegec jiegec force-pushed the next branch 7 times, most recently from c02fcbd to d17db4f Compare May 3, 2024 10:54
@Rot127
Copy link
Collaborator

Rot127 commented May 3, 2024

Regarding the CI:

  • The auto-sync job can fail. Since we have not yet merged the llvm-18 update
  • The clang-tidy errors should be fixed if it complains about changes in the LoongArch files. But if you have the time, it would be of course awesome if you could fix the one or two errors in cs.c as well.

@jiegec jiegec force-pushed the next branch 5 times, most recently from ce9dd99 to 1d37d04 Compare May 3, 2024 14:43
@jiegec
Copy link
Contributor Author

jiegec commented May 3, 2024

Features added:

  • Import tests from LLVM MC
  • Collect operand type and access
  • Collect registers read/modified

However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands.

@Rot127
Copy link
Collaborator

Rot127 commented May 4, 2024

However, in LoongArch assembly, memory operands are not special i.e. they are normal register/immediate operands. I am unsure how to create MEM operands.

Indeed a little tricky. I checked the td files and the ISA quickly and it seems that all LOAD and STOREs are either of the instruction format 3R, 2RI12 or 2RI14.
In the PrinterCapstone you can now emit these formats (and all the others) as additional information. This is also done for PPC (see Mapping.h and the generated values in the PPCGenCSMappingInsn.inc file).

Here is the code where we generate it for PPC. In your case you can do it just like for PPC there. All LoongArch instructions seem to be derived from the class LAInst. The first inheritance child of this class, is the format class. In the PPC case we search for the class I and then emit the first child of it. Which is the format class. You can do it the same way for the LoongArch instructions. Get the LAInst class and emit the name of the first child. Which is the format.
Additionally to this, you should emit if the instruction loads or stores memory. You can check the CGI->mayLoad and CGI->mayStore flags in PrinterCapstone to figure this out.

In the end it should look something like this:

In loongarch.h

typedef struct {
	loongarch_insn_form form;
	loongarch_mem_access maccess; // LOONGARCH_MEM_LOAD, LOONGARCH_MEM_STORE, LOONGARCH_MEM_NONE
} loongarch_suppl_info; // add this to the union in Mapping.h

A generated entry in LoongArchGenCSMappingInsn.inc should look something like this:

{
  /* <mnemonic> */
  LOONGARCH_LOAD.... /* 337 */, LOONGARCH_INS_LOAD...,
  #ifndef CAPSTONE_DIET
    { 0 }, { 0 }, { 0 }, 0, 0, {{ LOONGARCH_INSN_FORM_2RI12, LOONGARCH_MEM_LOAD }}
  #endif
},

Now, when you add the details about the operands in LoongArchMapping.c you can check the instruction format and the memory access flags you just generated. If the instruction loads memory and you know the format, you also know which operand is the base register, disponent, offset register etc.

Would this work?

@Rot127
Copy link
Collaborator

Rot127 commented May 28, 2024

@jiegec Did you already find time to run the fuzzing? The PR is almost done.

@jiegec
Copy link
Contributor Author

jiegec commented May 29, 2024

@jiegec Did you already find time to run the fuzzing? The PR is almost done.

Thanks, I have run the following fuzzing tests without crashes:

./cstool loongarch32 0 0xffffffff
./cstool -d loongarch32 0 0xffffffff
./cstool loongarch64 0 0xffffffff
./cstool -d loongarch64 0 0xffffffff

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the typo. Besides that, I approve it.
Thank you! Awesome job!

@kabeor Please review. I would finish the AArch64 PR, update llvm-capstone, retrigger the CI here to ensure everything is green and then we can merge this one.

README.md Outdated Show resolved Hide resolved
- Accompanied llvm changes: capstone-engine/llvm-capstone#45
- MC Tests are generated from llvm
- Instruction groups are implemented
- Register accesses are implemented
- Memory operands are handled for memory instructions
- Code are formatted using clang-format of LLVM 17

Co-authored-by: CoA <[email protected]>
@XVilka
Copy link
Contributor

XVilka commented Jun 8, 2024

@kabeor @aquynh this one is good to be merged - one more architecture support in Capstone, and the one that gets more popular with time.

tests/test_loongarch.c Outdated Show resolved Hide resolved
#include <capstone/platform.h>

struct platform {
cs_arch arch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use tabs for indentation, not spaces

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiegec It would be nonetheless nice if you could run clang-format on it. Even if we do a big formatting PR before the next release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done!

@aquynh
Copy link
Collaborator

aquynh commented Jun 13, 2024

excellent work, thank you for your effort!

just few comments on coding convention, please see my comments.

@Rot127 Rot127 mentioned this pull request Jun 15, 2024
6 tasks
@Rot127
Copy link
Collaborator

Rot127 commented Jun 19, 2024

@jiegec See jiegec#2 for the name change

@aquynh
Copy link
Collaborator

aquynh commented Jun 20, 2024

looking good now, except to few minor issues with the CI

@XVilka

This comment was marked as resolved.

@Rot127
Copy link
Collaborator

Rot127 commented Jun 24, 2024

@jiegec jiegec#3 should fix the test. Please check the CI in the PR to be sure.
Additionally, rebase this one after #2391 is merged.

Add test which contains now a tab.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

5 participants