Add support for Numpy 2.x #429

aMarcireau · 2024-06-04T00:26:03Z

The changes are based on recommendations from https://numpy.org/devdocs/numpy_2_0_migration_guide.html#c-api-changes.

The most visible user-facing change is the addition of two feature flags (numpy-1 and numpy-2). By default, both features are enabled and the code is compiled with support for both ABI versions (with runtime checks to select the right function offsets). Functions that are only available in numpy 1 or 2 are not exposed in this case. Disabling default features (for instance numpy = {version = "0.21.0", default-features = false, features = ["numpy-1"]}) exposes version-specific functions and fields but the library will panic if the runtime numpy version does not match.

I have not done much testing, this should be tried on different code bases (ideally ones that use low-level field access) before merging.

This currently uses std::sync::OnceLock to cache the runtime version. I realised too late that this is not compatible with the Minimum Supported Rust Version (it was introduced in 1.70.0). Using pyo3::sync::GILOnceCell isn't straightforward since py is not always available in functions that need to check the version to pick an implementation.

By default, the extension is compile with support for numpy 1 and 2 (with runtime checks to pick the right binary offset where needed). Features or fields that are specific to a version are hidden by default. Users can opt-out of numpy 1 + numpy 2 by disabling default features and selecting a version. The library panics if the runtime version does not match the compilation version if only one version is selected.

Essentially a port of https://github.com/numpy/numpy/blob/97356bc6f0d6538389a9eef475d883a0f4024c2a/numpy/_core/include/numpy/npy_2_compat.h

adamreichold · 2024-06-04T18:54:54Z

Cargo.toml

@@ -31,3 +31,8 @@ nalgebra = { version = "0.32", default-features = false, features = ["std"] }

 [package.metadata.docs.rs]
 all-features = true
+
+[features]
+default = ["numpy-1", "numpy-2"]


What is the cost to enabling a "support level" if it is never used? Since we have to do the check in any case, do the extra features really carry their weight?

adamreichold · 2024-06-04T18:57:02Z

src/npyffi/mod.rs

@@ -10,13 +10,21 @@
 )]

 use std::mem::forget;
+use std::os::raw::c_uint;


Please merge this into the line below.

adamreichold · 2024-06-04T18:58:51Z

src/npyffi/objects.rs

+#[cfg(feature = "numpy-2")]
+#[allow(non_snake_case)]
+#[inline(always)]
+pub fn PyDataType_ISLEGACY(dtype: *const PyArray_Descr) -> bool {


This functions and all the similar ones below must be unsafe fn as they have no way of validating the pointer argument, i.e. the caller has to ensure a valid pointer is passed.

adamreichold · 2024-06-04T19:03:56Z

src/npyffi/mod.rs

+
+pub const NPY_2_0_API_VERSION: c_uint = 0x00000012;
+
+pub static ABI_API_VERSIONS: std::sync::OnceLock<(c_uint, c_uint)> = std::sync::OnceLock::new();


I think GILOnceCell is applicable here as we can just require that the accessor functions like PyDataType_FLAGS take a py: Python token. (We define them here so there is not need to conform exactly to the C signature just as there is no guarantee that they will stay in sync with the definitions in NumPy's C headers.)

Besides accessor functions, the version is also needed in the API functions that have a different offset in Numpy 1 and 2 (for instance PyArray_CopyInto) or only exist in Numpy 1 or 2 (using them with the wrong runtime version would otherwise result in memory corruption or a segfault).

I'm happy to switch to GILOnceCell but just wanted to check that changing higher-level function signatures was ok. For instance,

rust-numpy/src/dtype.rs

Line 259 in 2170e16

pub fn flags(&self) -> c_char {

will need py: Python so that it can call PyDataType_FLAGS).

If you have GIL ref &'py PyArray<..> or a bound ref Bound<'py, PyArray<..>>, this implies access to a GIL token via e.g. Bound::py. So I don't think this is an issue.

In any case, this will be a breaking release so we can change the API where necessary.

adamreichold · 2024-06-04T19:04:46Z

src/npyffi/objects.rs

+}
+
+#[cfg(all(feature = "numpy-1", feature = "numpy-2"))]
+macro_rules! DESCR_ACCESSOR {


Suggested change

macro_rules! DESCR_ACCESSOR {

macro_rules! define_descr_accessor {

adamreichold

Thank you for working on this! From looking at the code, I think we can take a simpler approach and always build with support for both NumPy 1.x and 2.x.

I also think GILOnceCell is appropriate to handle the version information.

adamreichold · 2024-06-04T19:10:54Z

src/npyffi/array.rs

+    #[cfg(all(feature = "numpy-1", not(feature = "numpy-2")))]
    impl_api![50; PyArray_CastTo(out: *mut PyArrayObject, mp: *mut PyArrayObject) -> c_int];
+    #[cfg(all(not(feature = "numpy-1"), feature = "numpy-2"))]
+    impl_api![50; PyArray_CopyInto(dst: *mut PyArrayObject, src: *mut PyArrayObject) -> c_int];


Since builds with support for versions will be common, I think this does not really add that much compile-time safety. I would rather suggest we extend the impl_api macro to do a runtime version check for function which are only available in one or the other version, e.g.

impl_api![@npy1 50; PyArray_CastTo(out: *mut PyArrayObject, mp: *mut PyArrayObject) -> c_int]; impl_api![@npy2 50; PyArray_CopyInto(dst: *mut PyArrayObject, src: *mut PyArrayObject) -> c_int];

That sounds good! I'll make these changes.

…propriate

aMarcireau · 2024-06-07T10:28:36Z

@adamreichold I just pushed a new version. Here's a brief overview of breaking changes.

The struct numpy::npyffi::PyArray_Descr now only contains public fields with the same position in numpy 1 and numpy 2. The remaining fields (elsize, alignment, metadata, subarray, names, fields, and c_metadata) may be accessed with the functions PyDataType_$field, where $field is ELSIZE, ALIGNMENT and so on.
The following functions of the struct numpy::PyArrayDescr and the trait numpy::PyArrayDescrMethods now have an extra parameter py: Python<'py> in first position: itemsize, alignment, flags, ndim, has_object, is_aligned_struct, has_subarray, and has_fields.
The type of dtype flags is now u64 to support numpy 2. It was c_char before . Negative values were technically possible but the numpy 1 to numpy 2 back port suggests to simply cast flag values to unsigned (C) chars, which is what we do here (in the accessor function).

Icxolu · 2024-06-07T18:30:47Z

The following functions of the struct numpy::PyArrayDescr and the trait numpy::PyArrayDescrMethods now have an extra parameter py: Python<'py> in first position: itemsize, alignment, flags, ndim, has_object, is_aligned_struct, has_subarray, and has_fields.

I don't think it is neccessary to change the API here. Since these are Python types we already have the proof that the GIL is held. You can just get the token via self.py(). For the trait you probably have to remove the default implementation and move them into the impl on Bound to get access to the token.

aMarcireau · 2024-06-08T01:13:49Z

@Icxolu Good point! I made the changes that your suggested.

@adamreichold Let me know if you would like me to squash the commits that modified src/dtype, since I ended up rolling back many of the edits.

adamreichold · 2024-06-18T18:46:13Z

Sorry for not getting to this yet. We are in crunch mode at $DAYJOB until next week. Will look into as soon as I can.

aMarcireau added 3 commits June 4, 2024 09:53

Remove outdated 1.0 functions, add new 2.0 functions

fb7d512

Essentially a port of https://github.com/numpy/numpy/blob/97356bc6f0d6538389a9eef475d883a0f4024c2a/numpy/_core/include/numpy/npy_2_compat.h

Use the new "universal" access functions

17955d3

aMarcireau mentioned this pull request Jun 4, 2024

Support for Numpy 2 #409

Open

Fix runtime tests

7a5a567

adamreichold reviewed Jun 4, 2024

View reviewed changes

Remove feature flags and always check the version at runtime where ap…

2014e2a

…propriate

Avoid API changes by using self.py()

8619b76

Fixup

bea69cd

stinodego mentioned this pull request Jun 16, 2024

Support NumPy 2.0 pola-rs/polars#16998

Open

kylebarron mentioned this pull request Jun 26, 2024

Rust port vincentsarago/color-operations#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Numpy 2.x #429

Add support for Numpy 2.x #429

aMarcireau commented Jun 4, 2024 •

edited

Loading

adamreichold Jun 4, 2024

adamreichold Jun 4, 2024

adamreichold Jun 4, 2024

adamreichold Jun 4, 2024

aMarcireau Jun 5, 2024

adamreichold Jun 5, 2024

adamreichold Jun 4, 2024

adamreichold left a comment

adamreichold Jun 4, 2024

aMarcireau Jun 5, 2024 •

edited

Loading

aMarcireau commented Jun 7, 2024 •

edited

Loading

Icxolu commented Jun 7, 2024

aMarcireau commented Jun 8, 2024

adamreichold commented Jun 18, 2024


		pub const NPY_2_0_API_VERSION: c_uint = 0x00000012;

		pub static ABI_API_VERSIONS: std::sync::OnceLock<(c_uint, c_uint)> = std::sync::OnceLock::new();

	macro_rules! DESCR_ACCESSOR {
	macro_rules! define_descr_accessor {

Add support for Numpy 2.x #429

Are you sure you want to change the base?

Add support for Numpy 2.x #429

Conversation

aMarcireau commented Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamreichold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aMarcireau Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

aMarcireau commented Jun 7, 2024 • edited Loading

Icxolu commented Jun 7, 2024

aMarcireau commented Jun 8, 2024

adamreichold commented Jun 18, 2024

aMarcireau commented Jun 4, 2024 •

edited

Loading

aMarcireau Jun 5, 2024 •

edited

Loading

aMarcireau commented Jun 7, 2024 •

edited

Loading