ENH: speed up wide DataFrame.line plots by using a single LineCollection #61764

EvMossan · 2025-07-03T08:16:19Z

What does this PR change?

Speeds up DataFrame.plot(kind="line") when the frame is “wide”.
If the DataFrame has > 200 columns, a numeric index (e.g. RangeIndex
or integer/float values), is not a time-series plot, has no stacking
and no error bars, we now draw everything with a single
matplotlib.collections.LineCollection instead of one Line2D per column.
No API changes; behaviour is identical for smaller plots or the excluded
cases above.

Performance numbers

500 rows × 2000 cols (RangeIndex)	master	this PR	speed-up
`df.plot(legend=False)`	0.342 s	0.069 s	5×

Benchmarked on pandas 3.0.0.dev0+2183.g94ff63adb2, matplotlib 3.10.3, NumPy 2.2.6

Notes

This PR does not change anything for DatetimeIndex plots—those remain on the original per-column path. A follow-up could combine LineCollection with the x_compat=True workaround (see #61398) to similarly speed up time-series plots.
Threshold (> 200 columns) is a heuristic and can be tuned in review.
The fast path activates only for numeric indices. Datetime/period/timedelta
indices still use the original per-column draw, so behaviour there is
unchanged.

closes ENH: speed up DataFrame.plot using LineCollection #61532
tests added / passed (pytest pandas/tests/plotting -q)
code checks passed (pre-commit run --all-files)
added entry in doc/source/whatsnew/v3.0.0.rst

cc @shadnikn @arthurlw – happy to take any feedback 🙂

mroeschke · 2025-07-03T16:12:57Z

pandas/plotting/_matplotlib/core.py

+        threshold = 200  # switch when DataFrame has more than this many columns
+        can_use_lc = (
+            not self._is_ts_plot()  # not a TS plot
+            and not self.stacked  # stacking not requested
+            and not com.any_not_none(*self.errors.values())  # no error bars
+            and len(self.data.columns) > threshold
+        )
+        if can_use_lc:


I would prefer not to have a special casing like this because it's difficult to maintain parity between a "fast path" and the existing path.

Is there a way to refactor our plotting here to generalize the plotting to this form rather than the iterative approach below?

Thanks for the suggestion, @mroeschke

Removed the early-return fast path; use_collection now only decides how we draw after the shared loop, so there’s one unified code path.

Let me know if you’d like anything tweaked.

pandas/plotting/_matplotlib/core.py

pandas/tests/plotting/frame/test_linecollection_speedup.py

EvMossan · 2025-07-15T13:12:36Z

@jbrockmendel Done in the latest commit, thanks!

mroeschke · 2025-07-15T17:16:30Z

pandas/plotting/_matplotlib/core.py

+            label_str = self._mark_right_label(pprint_thing(label), index=i)
+            kwds["label"] = label_str
+
+            if use_collection:


I'm still not generally fond of having a different code path if some condition is met, especially since the condition is requires a magic number threshold

this is a reasonable concern. is there a downside to always using LineCollection?

@mroeschke @jbrockmendel, If we want to completely get rid of the path split and the magic threshold number, i have to patch few things:

pandas/plotting/_matplotlib/core.py (LinePlot._make_plot)
• Remove the current threshold (use_collection) condition.
• Always render DataFrame line plots using a single LineCollection.
• Add tiny proxy Line2D objects (invisible) to keep legends working as usual.
• Stacked plots and error-bar plots remain unchanged (they use separate code paths already).

pandas/plotting/_matplotlib/tools.py
• Adjust get_all_lines to return segments from any existing LineCollection.
(Needed for existing tests and autoscaling.)
• Adjust get_xlim similarly to compute limits directly from the LineCollection vertices.

pandas/tests/plotting/common.py and plotting tests
• Update tests to handle the new structure. Instead of direct access like ax.lines[...], tests will use a helper function aware of the new single-collection setup.

Documentation and Release Notes
• Clearly note in docs/whatsnew that ax.lines will be empty for DataFrame line plots.
• Users accessing line data directly should switch to pandas.plotting.get_all_lines(ax) or check ax.collections[0].

(No changes for Series plots or other plot types like scatter, area, bar, etc.)

⸻

Advantages:
• One simple and predictable rendering path for all DataFrame line plots.
• Significant speed-up for large DataFrames, negligible overhead for small DataFrames.
• Lower memory use (single artist instead of many) and easier future maintenance.

⸻

Potential Downsides (but manageable):
• Users relying on ax.lines[i] directly must adapt (addressed clearly in docs and deprecation shim).
• Interactive plots using “picker” callbacks may need minor code updates.
• A small batch of tests will need straightforward adjustments.

If you're comfortable with this, i can start an implementation. Are there any additional concerns i should keep in mind before coding?

Happy to iterate!

@EvMossan the unfortunate situation is that there aren't any maintainers with expertise in matplotlib, so the idea of reviewing everything you described is daunting. Is there a minimal version?

@jbrockmendel, I’ve tried every variant I can think of, but I still can’t get a single-path implementation that both preserves the ~5× speed-up and passes the full test suite-at this point I’m stuck and would welcome any ideas or guidance.

and passes the full test suite

How many/bad are the failures we're looking at? e.g. no one really cares about ax.lines[0] or whatever as long as the graphs look right.

The ideas that come to mind are 1) convince @mroeschke to be OK with multiple code paths, 2) ask a matplotlib maintainer for help, 3) spend a lot of time on this myself, 4) decide the affected tests are OK to change.

I'm hoping that 4 is viable. Keep in mind that if we go that route, you're tacitly volunteering to have me ping you next time an issue comes up in this part of the code/tests.

github-actions · 2025-08-22T00:08:14Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

EvMossan marked this pull request as ready for review July 3, 2025 08:59

EvMossan marked this pull request as draft July 3, 2025 09:00

EvMossan force-pushed the plot-linecollection-speedup branch 2 times, most recently from 7bf84c2 to 0febdd9 Compare July 3, 2025 10:23

EvMossan marked this pull request as ready for review July 3, 2025 11:19

mroeschke requested changes Jul 3, 2025

View reviewed changes

EvMossan force-pushed the plot-linecollection-speedup branch 2 times, most recently from f4f499e to 0febdd9 Compare July 4, 2025 13:17

simonjayhawkins added Visualization plotting Performance Memory or execution speed performance labels Jul 5, 2025

EvMossan added 6 commits July 6, 2025 09:56

STYLE: apply ruff / isort auto-formatting on core.py

7e9cbd8

TST: use default_rng to satisfy Ruff NPY002

6910da7

DOC: add Performance improvement bullet for LineCollection speed-up

8b7b0df

TST: skip speedup test when matplotlib is not installed

d9ac7a6

DOC: whatsnew entry for LineCollection speed-up

a490e24

REF: unify _make_plot; single path with LineCollection option

308f6a6

EvMossan force-pushed the plot-linecollection-speedup branch from 0febdd9 to 308f6a6 Compare July 6, 2025 06:56

EvMossan added 4 commits July 6, 2025 10:05

DOC: whatsnew entry for LineCollection speed-up

08c0fa9

TYP: silence mypy warnings in unified _make_plot

1a6f47b

TYP: align ignore hints after line shifts; drop unused ignore

4e26644

MAINT: replace ambiguous space in plotting docstring

3badad1

EvMossan requested a review from mroeschke July 6, 2025 12:10

EvMossan closed this Jul 8, 2025

EvMossan reopened this Jul 8, 2025

jbrockmendel reviewed Jul 14, 2025

View reviewed changes

pandas/plotting/_matplotlib/core.py Show resolved Hide resolved

jbrockmendel reviewed Jul 14, 2025

View reviewed changes

pandas/tests/plotting/frame/test_linecollection_speedup.py Show resolved Hide resolved

jbrockmendel reviewed Jul 14, 2025

View reviewed changes

pandas/tests/plotting/frame/test_linecollection_speedup.py Outdated Show resolved Hide resolved

DOC/TST: add GH#61764 tag and rename test_linecollection.py

706fb5e

mroeschke requested changes Jul 15, 2025

View reviewed changes

github-actions bot added the Stale label Aug 22, 2025

Uh oh!

ENH: speed up wide DataFrame.line plots by using a single LineCollection #61764

Are you sure you want to change the base?

ENH: speed up wide DataFrame.line plots by using a single LineCollection #61764

Uh oh!

Conversation

EvMossan commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR change?

Performance numbers

Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EvMossan commented Jul 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EvMossan commented Jul 3, 2025 •

edited

Loading