With the candidate filing deadline soon approaching for local races, the biannual deliberation about which ballot titles are most effective begins. Ballot titles are what candidates throughout California are allowed to place beside their name, typically a core profession, giving voters a brief understanding of what makes them qualified for the office they're running for.
Typically, the strategic debates about what titles to use are driven by feel, by anecdotal evidence, or by a combination of both. However, thanks to data made available, it is now possible to conduct a more rigorous analysis of which titles actually lead to winning elections and which ones might fare less fortunately.
Understanding the Source of Our Data: The California Elections Data Archive (CEDA)
The data used in this analysis comes from the
California Elections Data Archive (CEDA), a significant repository managed in collaboration between the Center for California Studies and the Institute for Social Research at California State University, Sacramento, along with the office of the California Secretary of State. The purpose of CEDA is multifaceted but centers around providing a comprehensive collection of local election data that serves researchers, the public, governmental agencies, and other stakeholders.
What CEDA Contains
CEDA meticulously compiles and maintains detailed records of local election outcomes across California. This includes data for various types of elections such as county, city, community college, and school district elections. The archives contain information on:
-
Candidates: Including their names and ballot designations.
-
Ballot Measures: Detailed records of vote totals and the text of measures across various categories like charter amendments, taxes, bonds, and more, sorted by topic areas such as education, public safety, and governance.
My Approach to Data Analysis
For the purposes of our analysis, I extracted election data from every year available in the CEDA database, focusing on the range from 2016 to 2023. I meticulously combined these annual records into a master spreadsheet that includes comprehensive details on ballot designations and election outcomes. This robust dataset serves as the foundation for our in-depth exploration of what makes certain ballot designations more successful than others.
Simplifying Election Data Analysis with Python
Python is a versatile programming language widely recognized for its ease of use and powerful libraries, making it a popular choice for data analysis. In this project, I use Python to streamline the process of organizing and analyzing election data. By employing Python, we can efficiently clean the data, identify key patterns, and calculate statistics that reveal which ballot designations are most successful.
Cleaning and Preparing the Data
Before diving into the analysis, it was crucial to ensure the data was clean and consistent to guarantee the accuracy of the results. This involved several key steps:
-
Lowercasing Text: To maintain uniformity across the dataset, all ballot designations were converted to lowercase. This standardization helps to avoid duplicates that are formatted differently (e.g., "Small Business Owner" vs. "small business owner").
-
Removing Text After Commas: Initial analysis revealed a significant amount of noise created by incumbents listing the geographic areas they represented, which often followed a comma. For example, "Attorney, County of San Diego" includes unnecessary geographic detail for our analysis that might cloudy our results. To focus solely on the roles themselves and to reduce noise, all text following commas was removed. This step helped to standardize entries and concentrate on the main designation rather than the jurisdiction.
-
Handling Special Characters: Candidates often use special characters in their titles, such as slashes in "CEO/Parent." Removing these characters allowed us to isolate and analyze the individual components of their designations more effectively. This was important for ensuring that similar roles were grouped together, facilitating a more straightforward comparison of common professional titles without the interference of formatting differences.
-
Stripping Extra Spaces: Any leading or trailing spaces around the words were removed. This step is essential for clean data processing, ensuring that entries are treated uniformly and that no artificial distinctions are made due to spacing issues.
These preparatory steps were designed to refine the dataset into a format that is amenable to rigorous analysis, reducing variability that does not contribute to understanding the impact of ballot designations.
How We Analyze Ballot Designations
The Challenge with Variability in Titles
When initially examining the data on ballot designations, it quickly became apparent that there was significant variability in how candidates described their professions or roles. While my first instinct was to simply take the column of the spreadsheet indicating ballot designations and run a straightforward analysis, it became clear that this wouldn’t be a helpful approach. Titles that were conceptually similar often had slight differences in wording, which made direct comparisons challenging. For example, one candidate might list themselves as a "Small Business Owner," while another might say "Owner of a Small Business." To a human, these mean the same thing, but to a computer analyzing the text, they appear distinct.
Overcoming Variability with N-grams
To address this, we moved beyond analyzing individual words to examining phrases, known technically as "n-grams." This method allows us to capture commonly used phrases such as "small business owner" in their entirety, providing a more accurate and meaningful analysis. It helps us see not just the popularity of single words but also the impact of specific titles as they are commonly presented to voters.
Setting a Minimum Threshold
A critical part of our analysis involved setting minimum thresholds for including words and phrases in our review:
-
Single Words: We set a minimum occurrence of 50. This means that a single word needs to appear in at least 50 different ballot designations to be considered in our analysis.
-
Phrases (N-grams): We used a lower threshold of 20 for phrases. This decision was made to ensure that while we capture less frequent but potentially significant phrases, we also maintain a focus on those that have enough occurrences to provide statistical credibility.
Why Thresholds Matter
These thresholds help filter out noise—random variations that don’t provide useful insights. By only analyzing words and phrases that meet these criteria, we ensure that our findings are based on patterns that are truly significant, not anomalies. It lends statistical weight to our conclusions, giving future candidates solid evidence on which titles resonate most with voters.
Analyzing the Results of Election Data
In our comprehensive analysis, we examined a dataset encompassing 9,085 elections involving a total of 27,276 candidates. For the overall results, they included county, city, community college, and school district elections throughout the State. I also broke out the list to just City Council and School Board results as well. To set a baseline for our findings, we established that the overall win rate across all candidates was 48.65% (a "win" for the purposes of this analysis is any candidate who is either elected in a given election, or advances to the runoff). For school board candidates, this figure is 49.19%, and for City Council candidates, it’s 44.20%. This statistic serves as a benchmark, helping us evaluate the effectiveness of different ballot titles.
A straightforward approach to interpreting these results is to compare individual ballot titles against this benchmark:
-
Ballot titles with a win rate higher than 48.65% are considered more effective or optimal, suggesting they may have a positive impact on a candidate's chances of winning.
-
Ballot titles with a win rate below 48.65% might be less advantageous or could potentially detract from a candidate's appeal to voters.
However, as I will explain below, there is a bit of nuance involved in assessing which titles are effective and which aren’t.
Analysis
Throughout our analysis, we encountered residual noise despite our rigorous data-cleansing efforts. Certain entries like "of el" or "placer county" were manually removed as they added unnecessary noise; however, some similar instances may still be present. I decided to retain potentially repetitive results such as "superintendent of" and "superintendent of schools" to avoid arbitrary exclusions, ensuring our dataset remains as comprehensive as possible.
Although the dataset has been cleaned, it's important to recognize that the list isn't perfectly organized—it offers a somewhat fuzzy reflection of the performance of various words or phrases. Nonetheless, it provides a solid general understanding of how certain designations can influence electoral success. While I've established a minimum threshold for appearances to add some statistical validity, the argument remains that the sample size might not be large enough for drawing definitive conclusions. Thus, this analysis should not be viewed as an academically rigorous study but rather as a quantitative supplement to what is generally a qualitative process dominated by candidates' perceptions and the anecdotal experiences of political consultants.
Not surprisingly, the ballot titles with the highest success rates almost exclusively denote incumbency. In many down-ballot races where voters have limited information, simply seeing that a candidate is an incumbent—therefore already performing the job—can be sufficient to secure their vote. Nonetheless, this analysis isn’t conducted in a vacuum: ballot titles are just one of many factors that influence election outcomes, and incumbents naturally benefit from more than just their designations.
To better understand the data, it might be useful to consider it in terms of two tiers: an "incumbency tier" and a "non-incumbency tier." Given that the average win rate for all candidates is 48.65%, it doesn't imply that first-time candidates should avoid any title below this threshold, especially since such results are skewed by titles typically used by incumbents. For instance, "certified public accountant" has a 48% win rate, which is more appealing than "financial officer" at 18%, even though both are below the average.
So why do certain titles perform better than others? In general, there are essentially three determinations in my eyes that voters make: trust, competency, and relevance.
In terms of trust, this ties to polls you might often see of the most trusted professions. Gallup conducts a poll every year of the most trusted professions:
Gallup Poll on Ethics Ratings. Nurses, doctors, and police officers typically rank among the highest, whereas lawyers, stock brokers, and car salespeople typically rank lower. It's no surprise then that, in general, the more trusted professions would do better on the ballot.
Competency is another important factor. While it may not be entirely clear why a firefighter or school teacher would make a good city council member, there is an understanding that these are jobs that require a certain level of competency and responsibility, and there is also an implied alignment with public service. Meanwhile, "college student" performs poorly, not because people might distrust college students, but because, well, they are college students, and it's not entirely clear why they'd be competent to hold elected office in the eyes of the voters.
Relevance is the third metric. "Software engineer" hasn't performed very well (17% win rate), not because people have a strong aversion to software engineers, but because they likely have a hard time drawing a connection between that type of work and the type of work that needs to be done by an elected official. The same goes for "business consultant" (24% win rate) and "community advocate" (31%). Part of this is also tied to what I'd refer to as "tangibility." In other words, can a voter imagine what the person's job actually is? On the one hand, "community advocate" sounds good on paper. On the other, if you asked someone to describe what they think a community advocate's day looks like, they probably couldn't tell you. While on its own it's a positive phrase, it gets drowned out when there are more tangible titles on the ballot such as "firefighter" or "high school teacher." "Sales manager," "business consultant," and "operations manager" also seemingly fall victim to the tangibility factor. And it should come as no surprise that professions tied to education get a bump, as well as those that might be a bit more relevant to running a city.
Insights About Commonly Used Ballot Titles
Business Owner Titles: The designation "small business owner" is often perceived as a strong ballot title. However, data reveals it only has a 36% win rate. In contrast, "local business owner" shows a more favorable outcome with a 50% win rate. The term "small business owner" suffers from a lack of tangibility—it's somewhat vague and doesn't convey specific local ties. Conversely, "local" adds substantial weight, likely conjuring in voters' minds the image of a neighborhood store owner, which can enhance the candidate's appeal.
Legal Profession Titles: The title "attorney" has a varied performance based on context. Standing alone, "attorney" has a win rate of 48%, but this figure is possibly bolstered by specific roles such as "district attorney," "deputy district attorney," and "city attorney," all of which generally perform well. Titles referring to private practice, such as "lawyer" and "attorney at law," show lower success rates of 38% and 16%, respectively. This suggests that titles linked to public service in law are more favorable than those associated with private legal practice.
Parent as a Ballot Title: The word "parent" alone aligns with an overall win rate of 48%, matching the average for candidates and ranking high among non-incumbent titles. A separate analysis focusing solely on the use of "parent" as a ballot title shows a slightly lower win rate of 44.31% overall, and 44.92% specifically for school board elections. While using "parent" alone does not detract significantly from a candidate's appeal, it seems to be more effective when combined with other, more tangible titles. For instance, "parent educator" and "teacher parent" perform better than just "parent," as well as better than "teacher" and "educator" individually. Similarly, less definitive titles like "business owner" benefit from the addition of "parent," suggesting that pairing "parent" with another professional identifier enhances the title's effectiveness.
I hope this analysis was helpful. If you have any follow up questions, observations, or criticisms, please feel free to e-mail me at
mason@edgewater-strategies.com.