Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UFloat in PintArray (#139) #140

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b5954fd
Add support for UFloat in PintArray (#139)
MichaelTiemannOSC Oct 15, 2022
0ad1cf9
Fix failures and errors found by test_pandas_extensions test suite.
MichaelTiemannOSC Oct 19, 2022
52ab185
Preserve incoming np.array when promoting float to ufloat in PintArray
MichaelTiemannOSC Oct 21, 2022
f3cdcad
Fix logic to detect heterogeneous arrays of Ufloats and floats.
MichaelTiemannOSC Oct 25, 2022
3ffb617
Add support for UFloat in PintArray (#139)
MichaelTiemannOSC Oct 15, 2022
2f89897
Fix failures and errors found by test_pandas_extensions test suite.
MichaelTiemannOSC Oct 19, 2022
dce2668
Preserve incoming np.array when promoting float to ufloat in PintArray
MichaelTiemannOSC Oct 21, 2022
d4ca9f0
Fix logic to detect heterogeneous arrays of Ufloats and floats.
MichaelTiemannOSC Oct 25, 2022
c375aeb
Merge branch 'ducks-unlimited' of https://github.com/MichaelTiemannOS…
MichaelTiemannOSC Nov 3, 2022
9fffcc5
Update pint_array.py
MichaelTiemannOSC Jan 2, 2023
8b06708
Update pint_array.py
MichaelTiemannOSC Jan 3, 2023
c5b7926
Update pint_array.py
MichaelTiemannOSC Jan 3, 2023
232857c
Merge branch 'master' into ducks-unlimited
MichaelTiemannOSC Jun 26, 2023
0b0e4d4
Fix and blacken merge
MichaelTiemannOSC Jun 26, 2023
959570f
Fix ruff complaints in testsuite
MichaelTiemannOSC Jun 26, 2023
5270a46
Fix numerous regressions in test_pandas_extensiontests
MichaelTiemannOSC Jun 28, 2023
6ddf204
Update pint_array.py
MichaelTiemannOSC Jun 28, 2023
e1d367c
Update to us pd.NA instead of np.nan / _ufloat_nan
MichaelTiemannOSC Jul 2, 2023
dbf5ad1
Update pint_array.py
MichaelTiemannOSC Jul 2, 2023
3c6eff4
Progress: 2608 pass, 97 skip, 84 xfail, 6 xpass
MichaelTiemannOSC Jul 5, 2023
a0625f8
Make ruff and black happy
MichaelTiemannOSC Jul 5, 2023
94d3524
Make ruff happy (na_frame fixture import vs F811)
MichaelTiemannOSC Jul 5, 2023
a6c4040
Make black happy
MichaelTiemannOSC Jul 5, 2023
1506df2
Make black happy
MichaelTiemannOSC Jul 5, 2023
772636b
Fix DataFrame reduction for upcoming Pandas
MichaelTiemannOSC Jul 23, 2023
b759adb
Make black happy...
MichaelTiemannOSC Jul 23, 2023
bfb4a99
Update pint_array.py
MichaelTiemannOSC Jul 24, 2023
9d169f1
Switch to np.nan as NaN value
MichaelTiemannOSC Jul 28, 2023
289c604
Updated to Pandas 2.1.0.dev0+1401.gb0bfd0effd
MichaelTiemannOSC Aug 6, 2023
602a804
Keep up with pandas21_compat changes
MichaelTiemannOSC Aug 12, 2023
866bf7a
Merge branch 'master' into ducks-unlimited
MichaelTiemannOSC Aug 12, 2023
9f723f9
Merge remote-tracking branch 'upstream/master' into ducks-unlimited
MichaelTiemannOSC Aug 14, 2023
f0c7e64
Cleanups after merge
MichaelTiemannOSC Aug 15, 2023
5b39a3e
Update test_issues.py
MichaelTiemannOSC Aug 15, 2023
8ed4c1d
Add uncertainties to CI/CD
MichaelTiemannOSC Aug 15, 2023
2cb50f4
Update CI/CD to anticipate, not install or test uncertainties
MichaelTiemannOSC Aug 15, 2023
8c4bf7d
Update CHANGES
MichaelTiemannOSC Aug 15, 2023
37c6f6d
Merge branch 'master' into ducks-unlimited
MichaelTiemannOSC Sep 8, 2023
fc2814a
Merge branch 'master' into ducks-unlimited
MichaelTiemannOSC Sep 15, 2023
e365cbc
Test with Pint-0.23rc0 and uncertainties in ci/cd
MichaelTiemannOSC Sep 18, 2023
108cb71
2nd attempt integrating uncertainties and CI/CD
MichaelTiemannOSC Sep 18, 2023
a5758e7
Use `include` to handle uncertainties testing
MichaelTiemannOSC Sep 19, 2023
82442ab
Only test `uncertainties` in ci.yml
MichaelTiemannOSC Sep 19, 2023
5d351fc
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
c4e3a06
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
8d5feb9
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
bfe9a77
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
25adf35
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
afc3eb4
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
e281dfc
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
a208163
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
296bbdc
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
585f38d
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
75d4c56
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
6988212
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
764b609
Update ci.yml
MichaelTiemannOSC Sep 19, 2023
9ed23c1
Update ci-*.yml files
MichaelTiemannOSC Sep 19, 2023
b609001
Merge branch 'master' into ducks-unlimited
MichaelTiemannOSC Nov 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,15 @@ jobs:
matrix:
python-version: [3.9, "3.10", "3.11"]
numpy: ["numpy>=1.20.3,<2.0.0"]
pandas: ["pandas==2.0.2", "pandas==2.1.0rc0" ]
pint: ["pint>=0.21.1", "pint==0.22"]
pandas: ["pandas==2.0.2", "pandas>=2.1.0" ]
pint: ["pint>=0.21.1,<0.22", "pint==0.22", "pint>=0.23rc0"]
uncertainties: [""]
include:
- python-version: 3.9
numpy: "numpy>=1.20.3,<2.0.0"
pandas: "pandas>=2.1.0"
pint: "pint==0.23rc0"
uncertainties: "uncertainties==3.1.7"

runs-on: ubuntu-latest

Expand Down Expand Up @@ -57,6 +64,10 @@ jobs:
if: ${{ matrix.pandas != null }}
run: pip install "${{matrix.pandas}}"

- name: Install uncertainties
if: ${{ matrix.uncertainties != null }}
run: pip install "${{matrix.uncertainties}}"

- name: Run Tests
run: |
pytest $TEST_OPTS
Expand Down
3 changes: 2 additions & 1 deletion CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ pint-pandas Changelog
0.6 (unreleased)
----------------

- Support for uncertainties as magnitudes in PintArrays. #140
- Fix dequantify duplicate column failure #202
- Fix astype issue #196


0.5 (2023-09-07)
----------------

Expand Down Expand Up @@ -50,6 +50,7 @@ pint-pandas Changelog
- Tests reorganised #131
- Shortened form of dimensionless unit now in dtype, eg 'pint[]' #151
- Fixed bug preventing PintArrays with offset units being printed. #150
- Allow UFloat as type of magnitude supported in PintArray. #139

0.2 (2021-03-23)
----------------
Expand Down
113 changes: 107 additions & 6 deletions pint_pandas/pint_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,16 @@
# Magic 'unit' flagging columns with no unit support, used in
# quantify/dequantify
NO_UNIT = "No Unit"
from pint.compat import HAS_UNCERTAINTIES

# from pint.facets.plain.quantity import PlainQuantity as _Quantity
# from pint.facets.plain.unit import PlainUnit as _Unit

if HAS_UNCERTAINTIES:
from uncertainties import ufloat, UFloat
from uncertainties import unumpy as unp

_ufloat_nan = ufloat(np.nan, 0)

pandas_version = version("pandas")
pandas_version_info = tuple(
Expand Down Expand Up @@ -330,6 +340,36 @@ def __setitem__(self, key, value):
key = check_array_indexer(self, key)
# Filter out invalid values for our array type(s)
try:
if HAS_UNCERTAINTIES and is_object_dtype(self._data):
from pandas.api.types import is_scalar, is_numeric_dtype

def value_to_ufloat(value):
if pd.isna(value) or isinstance(value, UFloat):
return value
if is_numeric_dtype(type(value)):
return ufloat(value, 0)
raise ValueError

try:
any_ufloats = next(
True for i in self._data if isinstance(i, UFloat)
)
if any_ufloats:
if is_scalar(key):
if is_list_like(value):
# cannot do many:1 setitem
raise ValueError
# 1:1 setitem
value = value_to_ufloat(value)
elif is_list_like(value):
# many:many setitem
value = [value_to_ufloat(v) for v in value]
else:
# broadcast 1:many
value = value_to_ufloat(value)
except StopIteration:
# If array is full of nothingness, we can put anything inside it
pass
self._data[key] = value
except IndexError as e:
msg = "Mask is wrong length. {}".format(e)
Expand Down Expand Up @@ -381,6 +421,14 @@ def isna(self):
-------
missing : np.array
"""
if HAS_UNCERTAINTIES:
# GH https://github.com/lebigot/uncertainties/issues/164
if len(self._data) == 0:
# True or False doesn't matter--we just need the value for the type
return np.full((0), True)
# NumpyEADtype('object') doesn't know about UFloats...
if is_object_dtype(self._data.dtype):
return np.array([pd.isna(x) or unp.isnan(x) for x in self._data])
return self._data.isna()

def astype(self, dtype, copy=True):
Expand Down Expand Up @@ -542,6 +590,9 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
(item.to(dtype.units).magnitude if hasattr(item, "to") else item)
for item in scalars
]
# When creating empty arrays, make them large enoguh to hold UFloats in case we need to do so later
if HAS_UNCERTAINTIES and len(scalars) == 0:
return cls([_ufloat_nan], dtype=dtype, copy=copy)[1:]
return cls(scalars, dtype=dtype, copy=copy)

@classmethod
Expand All @@ -565,9 +616,37 @@ def _values_for_factorize(self):
# provided dtype. This may be revisited in the future, see GH#48476.
arr = self._data
if arr.dtype.kind == "O":
if (
HAS_UNCERTAINTIES
and arr.size > 0
and unp.isnan(arr[~pd.isna(arr)]).any()
):
# Canonicalize uncertain NaNs and pd.NA to np.nan
arr = np.array(
[np.nan if pd.isna(x) or unp.isnan(x) else x for x in arr]
)
return np.array(arr, copy=False), self.dtype.na_value
return arr._values_for_factorize()

def _values_for_argsort(self) -> np.ndarray:
"""
Return values for sorting.
Returns
-------
ndarray
The transformed values should maintain the ordering between values
within the array.
"""
# In this case we want to return just the magnitude array stripped of units
# Must replace uncertain NaNs with np.nan
if HAS_UNCERTAINTIES:
arr = self._data[~pd.isna(self._data)]
if arr.size > 0 and unp.isnan(arr).any():
return np.array(
[np.nan if pd.isna(x) or unp.isnan(x) else x for x in self._data]
)
return self._data

def value_counts(self, dropna=True):
"""
Returns a Series containing counts of each category.
Expand All @@ -592,16 +671,27 @@ def value_counts(self, dropna=True):

# compute counts on the data with no nans
data = self._data
nafilt = pd.isna(data)
na_value = pd.NA # NA value for index, not data, so not quantified
if HAS_UNCERTAINTIES:
nafilt = np.array([pd.isna(x) or unp.isnan(x) for x in data])
else:
nafilt = pd.isna(data)
na_value_for_index = pd.NA
data = data[~nafilt]
index = list(set(data))
if HAS_UNCERTAINTIES and data.dtype.kind == "O":
# This is a work-around for unhashable UFloats
unique_data = []
for item in data:
if item not in unique_data:
unique_data.append(item)
index = list(unique_data)
else:
index = list(set(data))

data_list = data.tolist()
array = [data_list.count(item) for item in index]

if not dropna:
index.append(na_value)
index.append(na_value_for_index)
array.append(nafilt.sum())

return Series(np.asarray(array), index=index)
Expand All @@ -613,10 +703,21 @@ def unique(self):
-------
uniques : PintArray
"""
from pandas import unique

data = self._data
return self._from_sequence(unique(data), dtype=self.dtype)
na_value = self.dtype.na_value
if HAS_UNCERTAINTIES and data.dtype.kind == "O":
# This is a work-around for unhashable UFloats
unique_data = []
for item in data:
if item is pd.NA or unp.isnan(item):
item = na_value
if item not in unique_data:
unique_data.append(item)
return self._from_sequence(
pd.array(unique_data, dtype=data.dtype), dtype=self.dtype
)
return self._from_sequence(data.unique(), dtype=self.dtype)

def __contains__(self, item) -> bool:
if not isinstance(item, _Quantity):
Expand Down
57 changes: 54 additions & 3 deletions pint_pandas/testsuite/test_issues.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,28 @@
from pandas.tests.extension.base.base import BaseExtensionTests
from pint.testsuite import helpers

try:
import uncertainties.unumpy as unp
from uncertainties import ufloat
from uncertainties.core import AffineScalarFunc # noqa: F401

def AffineScalarFunc__hash__(self):
if not self._linear_part.expanded():
self._linear_part.expand()
combo = tuple(iter(self._linear_part.linear_combo.items()))
if len(combo) > 1 or combo[0][1] != 1.0:
return hash(combo)
# The unique value that comes from a unique variable (which it also hashes to)
return id(combo[0][0])

AffineScalarFunc.__hash__ = AffineScalarFunc__hash__

_ufloat_nan = ufloat(np.nan, 0)
HAS_UNCERTAINTIES = True
except ImportError:
unp = np
HAS_UNCERTAINTIES = False

from pint_pandas import PintArray, PintType
from pint_pandas.pint_array import pandas_version_info

Expand Down Expand Up @@ -52,12 +74,16 @@ def test_force_ndarray_like(self):
pint.set_application_registry(prev_appreg)


@pytest.mark.skipif(
not HAS_UNCERTAINTIES,
reason="this test depends entirely on HAS_UNCERTAINTIES being True",
)
class TestIssue21(BaseExtensionTests):
@pytest.mark.filterwarnings("ignore::RuntimeWarning")
def test_offset_concat(self):
q_a = ureg.Quantity(np.arange(5), ureg.Unit("degC"))
q_b = ureg.Quantity(np.arange(6), ureg.Unit("degC"))
q_a_ = np.append(q_a, np.nan)
q_a = ureg.Quantity(np.arange(5) + ufloat(0, 0), ureg.Unit("degC"))
q_b = ureg.Quantity(np.arange(6) + ufloat(0, 0), ureg.Unit("degC"))
q_a_ = np.append(q_a, ureg.Quantity(np.nan, ureg.Unit("degC")))

a = pd.Series(PintArray(q_a))
b = pd.Series(PintArray(q_b))
Expand Down Expand Up @@ -171,6 +197,31 @@ def test_issue_127():
assert a == b


@pytest.mark.skipif(
not HAS_UNCERTAINTIES,
reason="this test depends entirely on HAS_UNCERTAINTIES being True",
)
def test_issue_139():
q1 = 1.234
q2 = 5.678
q_nan = np.nan

u1 = ufloat(1, 0)
u2 = ufloat(3, 0)
u_nan = ufloat(np.nan, 0.0)
u_plus_or_minus_nan = ufloat(0.0, np.nan)
u_nan_plus_or_minus_nan = ufloat(np.nan, np.nan)

a_m = PintArray(
[q1, u1, q2, u2, q_nan, u_nan, u_plus_or_minus_nan, u_nan_plus_or_minus_nan],
ureg.m,
)
a_cm = a_m.astype("pint[cm]")
assert np.all(a_m[0:4] == a_cm[0:4])
for x, y in zip(a_m[4:], a_cm[4:]):
assert unp.isnan(x) == unp.isnan(y)


class TestIssue174(BaseExtensionTests):
def test_sum(self):
if pandas_version_info < (2, 1):
Expand Down
Loading
Loading