-
-
Notifications
You must be signed in to change notification settings - Fork 19.5k
API/CoW: take view of underlying Block values for shallow (deep=False) copies #58966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
API/CoW: take view of underlying Block values for shallow (deep=False) copies #58966
Conversation
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
|
Plan to pursue this? |
|
Updated this old PR to see what CI says, because it might also help towards #63215 (comment) (see last paragraph of that linked comment). If shallow copies don't share array objects (through taking views), then the problem of mutable attributes of arrays (incorrectly) propagating would also be solved. It's not providing the method to update the underlying array, but it would "fix" the propagation of array attributes as illustrated in the example in #63215 (comment) |
| values = values.copy() | ||
| refs = None | ||
| else: | ||
| values = values.view() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual change (taking a view of the values when doing a shallow copy), the other changes below in this file are a few fixes to ensure we return self instead of a shallow copy for inplace operations.
|
|
||
| def __array__(self, dtype=None, copy=None): | ||
| return self.data | ||
| return np.asarray(self.data, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change just to honor the dtype keyword (in which case we should probably honor copy too?)? or is it important that we not return the object self.data? if the latter, is that a requirement for EAs that should be added to the interface tests?
In context of CoW, we want that modifications to shallow copies of DataFrame/Series don't propagate the the parent/child. For direct modification of values, we use the reference tracking mechanism to perform a delayed copy when needed. But the underlying EA can also have mutable attributes.
By returning a view of the array values in the shallow copy of the DataFrame/Series (i.e. and shallow copy of the Block) instead of the identical array object, we also avoid that kind of mutations to propagate.
For context, this might help towards #63215 (comment) (see last paragraph of that linked comment). If shallow copies don't share array objects (through taking views), then the problem of mutable attributes of arrays (incorrectly) propagating would also be solved.
It's not providing the method to update the underlying array, but it would "fix" the propagation of array attributes as illustrated in the example in #63215 (comment)