あらすじ:
Fairness auditing of machine learning models is essential for ensuring equitable treatment across demographic groups, while real-world scenarios often involve strict privacy and access constraints. In this work, we address a practical yet underexplored scenario in which an auditor holds a protected private dataset for evaluation, while the uncooperative target model offers only limited access. Conventional evaluation methods typically rely on either revealing the private dataset or requiring extensive cooperation, which is prevented in the scenario. To overcome this challenge, we propose a novel fairness auditing framework based on model behavioral comparison. Our method constructs a pool of reference models with controlled demographic biases trained on the private dataset, and an embedding space representing fairness similarity. The fairness of the target model is estimated based on the proximity to reference models in the embedding space. We further develop an efficient probe sampling strategy that significantly reduces the number of queries to the target model while maintaining estimation accuracy. We demonstrate the effectiveness of our framework on facial age estimation models and gender classification models, evaluating both gender and racial bias. Evaluation on internally trained models and external pre-trained models is carried out using FairFace and UTKFace datasets to simulate the private dataset and probe data. The experimental results show that our framework achieves close estimation with low mean absolute errors to the ground-truth fairness metrics. Compared to baselines based on annotated public datasets or leaking private data, our framework achieves accurate and efficient fairness evaluation while preserving data confidentiality and operating under limited model access. This framework enables practical, privacy-preserving fairness auditing for real-world scenarios involving regulatory and certification contexts where mutual distrust between auditors and service providers creates operational constraints.
種類: Journal paper at IEEE Access
日付: November 2025