BUG: `Series.clip` does not work with scalar numpy arrays. #59053

randolf-scholz · 2024-06-19T14:07:23Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
pd.Series([-1,2,3]).clip(lower=np.array(0))

Results in TypeError: len() of unsized object.

Issue Description

The following line tries to compute len(other), but scalar arrays have no len.

pandas/pandas/core/series.py

Lines 5892 to 5894 in c46fb76

    
           elif isinstance(other, (np.ndarray, list, tuple)): 
        
               if len(other) != len(self): 
        
                   raise ValueError("Lengths must be equal")

If we remove these two lines, the above example produces the expected result, and still errors as expected if e.g. a list of incorrect size is passed.

Expected Behavior

Scalar arrays should be treated like scalars.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.11.7.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-41-generic
Version : #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.2
numpy : 2.0.0
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 70.1.0
pip : 24.0
Cython : None
pytest : 8.2.2
hypothesis : 6.103.2
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.25.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.0
gcsfs : None
matplotlib : 3.9.0
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.4
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

rhshadrach · 2024-06-19T21:45:46Z

Thanks for the report. This does indeed appear to me to be an issue, but I wonder if this is wide-spread throughout pandas and what the ramifications of trying to fix this systematically would be. E.g.

from pandas._libs import lib

print(lib.is_scalar(np.array(0)))
# False

Further investigations are welcome!

jbrockmendel · 2024-06-20T00:43:51Z

Lib.itemfromzerodim

Edit [rhshadrach]: lib.item_from_zerodim

randolf-scholz · 2024-06-20T07:39:58Z

I think there are two ways to handle it:

Consider only objects that are scalars.
Consider objects that can be interpreted as scalars.

Regarding the latter, any element of a 1-dimensional vector space can be considered a scalar, since in this case the vector space and its base field are isomorphic. Towards this end, numpy, and many other libraries, offer the .item() function, which returns a scalar if the array contains exactly one element (although it doesn't seem part of the python Array API currently).

pandas._libs.lib.is_scalar seems to be in line here with numpy.isscalar, which also returns false for np.array(0), as technically, this is considered a 0-dimensional array and hence not a scalar.

If (1) is preferred by the maintainers, this issue can probably be closed. However, numpy.clip does support passing 0-dimensional arrays, and so does Series.where, which can be used to implement Series.clip:

import numpy as np
import pandas as pd
s = pd.Series([-1,2,3])
s_clipped = s.where(s>np.array(0), np.array(0))
pd.testing.assert_series_equal(s_clipped, s.clip(lower=0))  # ✅

Whether one wants to go with option ① or ② is probably just a matter of taste/design, but using this choice consistently throughout the API seems desirable.

rhshadrach · 2024-06-22T15:34:57Z

but using this choice consistently throughout the API seems desirable.

Right - I'm not sure how well this is supported throughout pandas. You mentioned clip, but there are a number of other methods that take scalars like this I think. It seem to me the next steps are to determine which methods support this, and from that we can find a reasonable way to achieve consistency.

randolf-scholz added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 19, 2024

rhshadrach added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `Series.clip` does not work with scalar numpy arrays. #59053

BUG: `Series.clip` does not work with scalar numpy arrays. #59053

randolf-scholz commented Jun 19, 2024 •

edited

Loading

INSTALLED VERSIONS

rhshadrach commented Jun 19, 2024 •

edited

Loading

jbrockmendel commented Jun 20, 2024 •

edited by rhshadrach

Loading

randolf-scholz commented Jun 20, 2024

rhshadrach commented Jun 22, 2024

BUG: Series.clip does not work with scalar numpy arrays. #59053

BUG: Series.clip does not work with scalar numpy arrays. #59053

Comments

randolf-scholz commented Jun 19, 2024 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

rhshadrach commented Jun 19, 2024 • edited Loading

jbrockmendel commented Jun 20, 2024 • edited by rhshadrach Loading

randolf-scholz commented Jun 20, 2024

rhshadrach commented Jun 22, 2024

BUG: `Series.clip` does not work with scalar numpy arrays. #59053

BUG: `Series.clip` does not work with scalar numpy arrays. #59053

randolf-scholz commented Jun 19, 2024 •

edited

Loading

rhshadrach commented Jun 19, 2024 •

edited

Loading

jbrockmendel commented Jun 20, 2024 •

edited by rhshadrach

Loading