Posted on behalf of Elliot Berkman (Psychology)
Following on the article “Applying the Yardstick, Department by Department” that Dr. Bothun recommended on this blog, I was inspired to think through what a more faculty-driven set of metrics might look like. The article quotes Bret Danilowicz, Dean of CAS as Oklahoma State, on the metrics system they implemented there:
Mr. Danilowicz thought there was a better way, starting with getting lower-level administrators, department heads, and faculty members to participate in the assessment process every year.
“To me, the president and provost are too high a level for this,” he says. “If the goal is to give your departments a chance to take a good look at themselves, and with an eye toward improvement, you need to get faculty and chairs more involved in the process.”
A marine biologist by training, Mr. Danilowicz had served as dean of science and technology at Georgia Southern University. Now he wanted to understand the wider range of disciplines he was sizing up at Oklahoma State. It was important, he thought, to develop qualitative measures, not just numbers that could be mashed up.
“I’m a scientist, so grants and publications were very important to me,” he says. But as he talked with chairs and professors outside the sciences, he saw that each discipline brought its own yardstick.
“I came to learn that people in humanities want to review the quality of their scholarship. And arts people value creativity, which is really hard to measure,” he says. “The more I learned, the more it seemed natural to have the departments develop their own criteria and do their own assessments, and for my office to give them my thoughts on what they come up with.”
This process could start with a broad set of values and principles that can be different for each department. For example, in Psychology, I might articulate some of our values as the pertain to scholarship as:
- High quality, high impact research
- Diversity, equity, and inclusion in our research
- Professional training and mentorship
- Interdisciplinary, team science
- Open and reproducible scholarship
Note that this is my personal articulation of some of our values and is not necessarily representative or universal. The set of values need to be a product of the entire department. I can imagine that having a series of high-level conversations within a department about what values it hopes to promote with its scholarship might be a useful and interesting exercise in its own right.
But, for now, let’s start with this initial set of values. How might these be translated into measurable variables? In psychology, for better or worse, the unit of scholarship is the peer-reviewed publication, typically in a journal. So, I can attempt to articulate ways that the values above could by translated into metrics on a per-paper level. For each paper, I can ask the following yes/no questions:
- Is the paper published in what my department considers to be a high quality, high impact, journal?
- Is the sample or authorship team diverse, equitable, and inclusive?
- Is a graduate student or postdoc first-author (or, possibly, any author)?
- Is the research team interdisciplinary, as defined by bridging subdisciplines or spanning fields?
- Are the methods open and reproducible?
How would this work with actual papers? Here are a few recent papers from my lab with scores:
- Cosme, D, *Mobasser, A., Zeithamova, D., Berkman, E.T., & Pfeifer, J.H. (in press). Choosing to regulate: Does choice enhance craving regulation? Social Cognitive and Affective Neuroscience.
- High-quality journal? ✅ This journal is considered a top journal in my field.
- Diverse sample or authorship team? ✅ The sample is not particularly diverse but the authorship team is.
- Student or postdoc first author? ✅ Cosme is a graduate student in psychology.
- Interdisciplinary? ✅ The authors span 3 of the 4 areas within our department.
- Open science? ✅ The data and code are publically available.
- Berkman, E.T. (2018). Value-based choice: An integrative, neuroscience-informed model of health goals. Psychology & Health, 33, 40-57.
- High-quality journal? Eh. It’s a niche journal but not what I’d consider top-tier.
- Diverse sample or authorship team? Nope.
- Student or postdoc first author? Nope.
- Interdisciplinary? The content is sorta interdisciplinary, but the team is not.
- Open science? This is a theory paper, but it is not open in the sense that it the journal is not open access.
- Giuliani, N.R., Merchant, J.S., *Cosme, D., & Berkman, E.T. (in press). Neural predictors of eating behavior and dietary change. Annals of the New York Academy of Sciences.
- High-quality journal? Moderate. This journal would probably not make a selective list.
- Diverse sample or authorship team? ✅ Diverse authorship team.
- Student or postdoc first author? No, Giuliani is faculty in the College of Education.
- Interdisciplinary? ✅ Yes, Giuliani is in the COE.
- Open science? ✅ This is a review, but the paper and some of the materials we used are publically available.
So what do I make of these ratings? As an author of these papers, this ordering (Cosme et al. > Giuliani et al. > Berkman) comports with my understanding of the “excellence” of these papers as I think of that term. Does this mean I think the Berkman (2018) paper is bad or low-quality? Not at all. In fact, I think there are some good ideas in that paper and that it might be influential in the field (a hypothesis I could test by watching how often it gets cited in the next few years). What is means is that the paper doesn’t advance my department’s values as much as the other two. I still get “credit” for it – it goes toward my publication count – but this system allows for a way to differentiate among my papers. The system prescribes a simple, moderately objective rubric for quickly assessing whether a paper promotes the values that I want to advance with my scholarship.
The incentives for our department are to publish papers that are in the journals we think are good; use diverse samples and authors; are authored by students and postdocs; are cross-disciplinary; and use open data and materials. I can game this system by publishing more papers like Cosme et al. Will I stop writing solo-authored theory pieces like the Berkman (2018) paper? No, because sometimes they’re fun and useful, and there’s not really a direct disincentive for me to write them (again, they still “count” and go on my CV and will be part of my p&t and merit review).
What would this process look like in practice? I can imagine that faculty score their own papers annually. When we send our CVs to CAS (which we already do each year), we could score all the new papers we published that year. Perhaps we could cap the score at some number, say 4, even if there are more than 4 values, so there are multiple ways for a paper to achieve the highest possible score. The departmental executive committee (or comparable), which already reviews files as part of merit reviews, could provide a sanity check on the scores produced by faculty.
At the department level, what we’d get is a count of total papers produced by the department that year, as well as an average values score for the papers in that department. Perhaps we could also supplement those metrics with some basic and readily available additional data such as citations and media mentions. Those decisions would be made at the department level.
In the end, the average values scores are not interpretable on their own. The would need to be contextualized primarily by year-over-year trends — as a department we want to see the scores go up next year — and possibly by similar data from comparator departments. We could gather that data for a small number of other departments ourselves, or, by making our process open and transparent, encourage other departments to start collecting themselves.