Of the many elements of the Phish.net feature set, one that often catches my curiosity is Trey's Notebook. It identifies songs most likely to be played at each show, given songs played in the previous year but not the previous three shows.
For upcoming shows, it's an algorithmic prediction ("Here's what you might expect to hear...") that often works remarkably well, such as predicting 68% of the 22 songs played three nights ago in Rochester. But for previous shows, focus on those percentages themselves rather than the list of songs, and Trey's Notebook becomes a measure of the extent to which Phish's setlists are predictable.
That varies widely, as this first chart illustrates. A handful of early shows were completely predicted (100%!), but many were predictive #fails (0%). Shows in 1990-93 were generally less predictable than shows before or since, largely as a function of the repertoire expanding during that period. And there's a general pattern, marked here with a fifth-order polynomial trendline, in maroon, though nothing stark. (Note that this scatterplot replaces an earlier, clunkier lineplot.)
Since the predictability also varies by tour, I also tried charting tour averages (depicted on a per-show basis, for comparison of both predictability and tour length) and tour-wise moving averages (for each tour, the first show's percentage predicted, then the first two shows' percentages averaged, then the first three, etc.) However, the lengths of tours (particularly as we define them) vary widely, with up to 121 shows in one "tour." And the percent correctly predicted varies across tours, generally increasing from start to finish, with an average percentage correct of 23.4% across the first shows of every tour but an average of 35.4% across the last shows of every tour.
So, this final chart averages, for each show, the percentage correctly predicted at the previous 30 shows. This "30-show moving average" is telling: Save for a few pronounced dips, Phish setlists have been getting generally more predictable over the past 20 years, such that Trey's Notebook now routinely predicts around 40% or more of each what the band plays. But, then, that's the case for the bulk of the past 700 shows - nearly half the band's history!
So, the next time some doe-eyed city reporter writes an article calling Phish "unpredictable", well, you can correct them: Maybe not so much as they used to be!
None of this analysis would have been possible without Adam's build of Trey's Notebook, and Stephen's backend querying to collect the data. Thanks to you both for fueling the infoporn!