Three levels of evidence: the Mentu Labs MEL standard

How to know if an EdTech tool really works? The obvious answer is: see if students learn more. But most technology projects in education measure things much easier—and much less relevant. At Mentu Labs, we developed a three-level MEL (Monitoring, Evaluation, and Learning) standard that forces us to answer the right questions.

The problem of empty indicators

Most technology projects in education measure engagement: number of sessions, time on the platform, content completed. These metrics are easy to collect, easy to report, and generally easy to manipulate. A good gamification design can make usage indicators skyrocket without any real learning taking place.

The problem is not that these metrics are useless; it’s that they are insufficient. Measuring use is necessary, but not enough. A teacher who uses the tool 5 hours a week and whose students learn no more than before is not generating impact. They are generating data.

Level 1: Do they use it?

The first level of evidence is necessary but not sufficient. Measuring whether teachers and students actually use the tool is the starting point of any honest evaluation. Without real use, there can be no real impact. But real use does not guarantee impact.

Level 1 Metrics: Percentage of weekly active teachers, average sessions per month, time of use per session, drop-out rate in the first 30 days, and percentage of features used. We establish minimum thresholds before moving to Level 2.

Level 2: Do they change how they teach?

This is the most difficult level to measure and the most revealing. If an AI tool doesn’t change pedagogical practice, it’s likely it won’t change learning results either. Practice change is the causal mechanism that connects the tool with student learning.

“If the tool doesn’t change how the teacher teaches, it’s digital decoration. Pretty, potentially expensive, and useless for students.” — Carlos Méndez, Research Director, Mentu Labs

To measure practice change, we use a combination of structured classroom observations, teacher interviews, lesson plan analysis, and, where possible, session recordings with informed consent. It is a significant methodological investment. And it is exactly that investment that allows us to make honest claims about our impact.

Level 3: Do students learn?

Learning evaluation session

The gold standard is the Randomized Controlled Trial (RCT). In an RCT, participating institutions are randomly assigned to intervention or control groups, allowing for the isolation of the causal effect of the tool from other contextual factors. It is expensive, logistically complex, and requires significant sample sizes. It is also the most rigorous way to know if something works.

Pre/post tests with randomized control groups to measure incremental learning.
Systematic classroom observations before and after implementation.
Teacher performance rubrics validated by pedagogical experts.
Qualitative interviews with students, teachers, and directors.
Cohort follow-up at 6 and 12 months to measure impact persistence.

Standard Summary:

Level 1: Adoption and sustained use of the tool.

Level 2: Measurable change in pedagogical practice.

Level 3: Real student learning.

Why this standard matters

In an ecosystem where noise abounds—announcements of ‘transformative impact’ without supporting data, vanity metrics presented as evidence—the three-level MEL standard is an act of responsibility. Not because we want to appear rigorous, but because we believe that students in Latin America deserve interventions that really work.

Adopting this standard has a cost. Projects last longer, the methodology is more complex, and results are sometimes uncomfortable—we have identified tools that don’t work and have abandoned them. But that honesty is the only basis on which real trust can be built with educators, institutions, and students.