Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jun 10, 2024

In context of CoW, we want that modifications to shallow copies of DataFrame/Series don't propagate the the parent/child. For direct modification of values, we use the reference tracking mechanism to perform a delayed copy when needed. But the underlying EA can also have mutable attributes.
By returning a view of the array values in the shallow copy of the DataFrame/Series (i.e. and shallow copy of the Block) instead of the identical array object, we also avoid that kind of mutations to propagate.


For context, this might help towards #63215 (comment) (see last paragraph of that linked comment). If shallow copies don't share array objects (through taking views), then the problem of mutable attributes of arrays (incorrectly) propagating would also be solved.

It's not providing the method to update the underlying array, but it would "fix" the propagation of array attributes as illustrated in the example in #63215 (comment)

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 11, 2024
@jbrockmendel
Copy link
Member

Plan to pursue this?

@jorisvandenbossche
Copy link
Member Author

Updated this old PR to see what CI says, because it might also help towards #63215 (comment) (see last paragraph of that linked comment). If shallow copies don't share array objects (through taking views), then the problem of mutable attributes of arrays (incorrectly) propagating would also be solved.

It's not providing the method to update the underlying array, but it would "fix" the propagation of array attributes as illustrated in the example in #63215 (comment)

@jorisvandenbossche jorisvandenbossche added this to the 3.0 milestone Dec 26, 2025
values = values.copy()
refs = None
else:
values = values.view()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change (taking a view of the values when doing a shallow copy), the other changes below in this file are a few fixes to ensure we return self instead of a shallow copy for inplace operations.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review December 26, 2025 10:42

def __array__(self, dtype=None, copy=None):
return self.data
return np.asarray(self.data, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change just to honor the dtype keyword (in which case we should probably honor copy too?)? or is it important that we not return the object self.data? if the latter, is that a requirement for EAs that should be added to the interface tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants