Evaluating Work in the Age of AI: A Manager's Perspective
Why clarity matters more than ever, and how to bring AI-shaped work into performance conversations.
AI is already showing up in performance conversations. Sometimes it's an explicit mention, sometimes it appears more implicit and assumed but it's there, often before companies have the process or structure to handle it well.
A while back, I helped run an AI fluency session at my previous organization. It was meant for everyone, not just applied scientists. As I walked out of the room that day, one question stayed with me:
If people are learning to work with these tools, how should that show up when we talk about performance?
This wasn't abstract for me. I was already using Claude to check ideas or debug code faster. PMs around me were building small prototypes in an afternoon using Replit or Lovable. UX researchers were running deeper studies without waiting for engineering support. The usage varied across roles, but the acceleration was unmistakable.
Around the same time, conversations with friends echoed the same shift. A friend at Airbnb mentioned AI usage appearing in performance check-ins. Someone at Shopify pointed me to what Tobi Lütke has been saying, that AI fluency is becoming table stakes. Then Meta announced that "AI-shaped work" would now be part of performance evaluations.
It wasn't a slow pattern anymore. It was already here, shaping how teams work and how performance is judged.
Across companies, AI isn't just showing up in tools. It is now embedded in expectations and is becoming a real part of how work is evaluated today.
Which means the real question isn't whether AI should affect performance, but rather:
How do we evaluate AI-shaped work in a way that stays fair, clear, and human?
Why This Matters to Me
Ambiguity in performance systems has two predictable outcomes: bias and confusion.
Years ago, someone on my team delivered work that didn't align with what I had in mind. It took me a moment to realize the mismatch wasn't about effort or skill. It was simply that what "good" looked like was far clearer in my head than it was in theirs.
Nothing dramatic happened, but the discomfort stayed with me. It reminded me that even well-intentioned managers can leave too much to interpretation. Ambiguity feels harmless until you see how it lands on someone else. That was the moment I stopped assuming clarity was shared.
A different moment taught me the other half of the lesson. I remember a mid-cycle conversation, one of those quick check-ins where you are just trying to understand how someone is doing. We talked through a project, clarified expectations, and adjusted the plan. A few weeks later, their trajectory shifted in a way that felt natural and grounded. It wasn't because they suddenly worked harder. It was because the system gave both of us room to learn and adapt.
Those two moments shaped how I think about performance today. They taught me that fairness is not created by a single rubric or a well-written review. It comes from systems that reduce ambiguity, surface the actual work, and treat performance as a living loop rather than an end-of-year verdict.
Performance falls apart quietly when expectations are vague. Strong systems reduce ambiguity before it turns into bias.
Performance management is never fully objective or fully subjective. It's always a mix of shared heuristics, clear expectations, and enough structure to catch our own blind spots. The goal isn't precision. It's fairness. And that is why AI-shaped work matters so much in this conversation. If expectations are shifting, and they clearly are, then the systems around them need to be more intentional, more grounded, and more humane.
The Current Reality: How Teams Are Actually Using AI
Inside organizations today, AI usage generally falls into two broad buckets.
1. Intentional, high-quality usage
People use copilots to reduce friction. Teams use AI to speed up research, explore more design options, test assumptions, or automate small workflows. This usage shortens cycles and brings more clarity earlier in the process.
2. AI slop
Quick prompting. Shallow output. A lot of activity that looks productive but doesn't move the work forward. We've all seen this. It isn't harmful, but it isn't helpful either.
Most teams live somewhere between these two extremes. This is why so many managers are now asking simple questions in regular check-ins:
Where did AI help? Where did it slow things down? Did it improve the work, or did it create rework?
These conversations matter because they reveal the real patterns.
Where Bias Creeps In
New tools don't erase bias; they shift where it hides. A few patterns are already showing up:
Confidence becomes a proxy for competence. Someone who talks fluently about their AI usage sounds more advanced, even if someone else is using the tools more effectively but more quietly.
Roles with more artifacts appear more "AI-forward." Engineers have visible outputs. A PM's strategic clarity doesn't always create obvious "AI-shaped" artifacts.
Narrative skill gets mistaken for impact. People who explain their workflow well can appear more capable than those who made better decisions but spoke less.
Early-career employees get unfair comparisons. They use AI more for support, while senior folks rely more on judgment, and both can be misread without context.
Visible activity becomes a false signal. Dozens of prompts look busy. Deep thinking before generating anything looks like low usage.
None of these biases are malicious. They are subtle, and subtle is exactly where most unfairness grows. These patterns seem small, but they add up quickly.
The Easy Failure Mode: Measuring the Wrong Things
It's tempting to track what's simple to count:
- token usage
- prompt volume
- hours "saved"
- tool-open time
- number of AI-generated files
These metrics look objective. They say nothing about judgment, clarity, or quality.
High usage might reflect mastery or confusion. Low usage might reflect senior confidence or hesitation. Prompt volume might reflect exploration or thrashing.
And none of these tell you whether the work got better.
Performance isn't about activity. It's about outcomes and decisions.
The Real Path Forward: Fold AI Into What We Already Value
We don't need a new metric for AI. We need better ways to observe its influence on real work.
A simple approach is to treat AI as an accelerator and ask:
- Did it shorten the exploration loop?
- Did it unblock work sooner?
- Did it bring clarity earlier?
- Did it expand the range of ideas?
- Did it reduce repetitive toil?
- Did it help someone learn faster?
These questions already deepen the dimensions we already value.
- Craft : clearer writing, cleaner code, better structure, stronger reasoning.
- Impact : shorter cycles, better documentation, improved team velocity, broader exploration.
- Systems Thinking: redesigned processes, less friction, new paths that were not possible before.
- Collaboration : shared workflows, reusable artifacts, insights that help adjacent teams.
- Cross-Team Uplift: removing toil for MLOps, creating templates, accelerating other teams.
- Learning : deeper understanding through tools, and sharing that learning widely.
And these reflections shouldn't wait for talent reviews. They belong in 1, retros, sprint reviews, design critiques; anywhere the work already lives.
AI-shaped work becomes visible through the work itself, not through a separate checkbox.
Closing
AI won't redefine what good work is. The fundamentals: clarity, judgment, craft, and the ability to uplift others still matter most.
What AI does change is the pace of exploration and the shape of everyday work. And our systems need to reflect that shift.
Not with new scorecards or complicated rules, but with clearer expectations, better questions, and more thoughtful loops.
We don't need perfect answers yet. We just need systems honest enough, and human enough to keep up with the reality that's already here.