AI systems require either a reliable mechanism for autonomous self-improvement or human evaluators for error detection and feedback to enhance knowledge work, according to Ahmad Al-Dahle, CTO of Airbnb (via VentureBeat). He emphasizes that the industry has significantly invested in autonomous mechanisms while neglecting the importance of robust human evaluation.
Al-Dahle notes that recent hiring of new graduates by major tech companies has declined by half since 2019. He cites tasks such as document review and code review that have been automated, resulting in what economists refer to as “displacement,” while companies term it “efficiency.” This trend raises concerns about future repercussions related to human evaluation capabilities.
The limitations of self-improvement in knowledge work come from the fluid nature of professional domains, as Al-Dahle highlights. Successful reinforcement learning examples, like AlphaZero, thrive in stable environments with definitive rules and rewards. In contrast, knowledge work is characterized by continuously evolving rules subject to change, necessitating ongoing human guidance in AI evaluation.
Many current AI systems are trained on the expertise of experienced workers. However, the automation of entry-level jobs, which traditionally cultivate such expertise, limits the next generation’s capacity for judgment. Al-Dahle warns that knowledge could atrophy not from an external catastrophe, but from individual economic decisions that eliminate the need for certain expertise.
He elaborates on the potential collapse in demand for fields such as advanced mathematics and coding. When the organizations no longer need such expertise for day-to-day operations, the incentive to pursue these careers diminishes, leading to a reduction in skilled professionals and an eventual decline in innovative capabilities.
Al-Dahle explains that while automation can replicate results in structural engineering or other fields, it does not equate to an authentic understanding of the foundational knowledge. This creates a “hollowing out” effect, where superficial performance remains despite an underlying loss of human expertise capable of contextual validation and correction.
Current evaluation methods are primarily rubric-based, which capture only measurable criteria. Al-Dahle states that while these techniques aim to reduce reliance on human evaluators, they fail to encompass deeper judgments and intuitive senses that cannot be codified into a rubric.
Al-Dahle believes there is potential to close the evaluation gap through future advancements, but stresses that such solutions are currently unavailable. He advocates for treating the evaluation gap in AI with the same urgency as capability development to ensure effective human participation in the evaluation process, as the continuous dismantling of human evaluators is occurring not as a strategic choice, but as a result of accumulated rational decisions.
“The thing AI most needs from humans is the thing we’re least focused on preserving,” Al-Dahle stated, highlighting the risk of ignoring this critical aspect of AI development.





