Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSE performance optimizations #112

Open
pingladd opened this issue Apr 12, 2023 · 3 comments
Open

FSE performance optimizations #112

pingladd opened this issue Apr 12, 2023 · 3 comments

Comments

@pingladd
Copy link

Hello, I am working on the FSE performance optimizations and trying to implement 8-states instead of the default 2-states, which might accelerate the decompressing by processing the 8-states in parallel. To process the 8-states, I changed the bitContainer to a __m128i type and modified all the related functions. Some files were compressed and decompressed correctly for the tests, but some were corrupted because of minor differences between the decoded and the original files.
I tried to find the bug but couldn't make it. I would like to know if it is possible to have someone check my code or discuss it with me? Thank you!

@Cyan4973
Copy link
Owner

I would recommend starting with 4-states, using a 64-bit container.
This would be much more straightforward and easier to debug.

After such a success, it would be a smaller step to stretch that implementation to 8-states using a 128-bit container.

Note that __m128i is unlikely to be a native type.
This comes with some rather complex consequences when operating on such a type, meaning that operations like + or << are no longer as simple as they look, with corresponding impact on performance. As a consequence, it's not obvious if such a move would improve performance.

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Apr 12, 2023 via email

@pingladd
Copy link
Author

pingladd commented Apr 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants