Abstract: |
We consider a regulator aiming to encourage sustainable behavior among agents, including individuals and businesses. Individuals face choices regarding goods, food, services, and mobility, while businesses make decisions on production modes, internal organization, and more. However, agents’ selfish utility-maximizing choices often conflict with sustainability. To promote sustainable choices, the regulator can use tailored "measures", such as incentives, subsidies, prices, taxes, or bans. Personalized policies adjust these measures to each agent’s needs and preferences [1].
To implement a personalized policy, the regulator must learn agents’ preferences by observing their past choices. Proposed personalized policies rely on the classic discrete choice modeling assumption that agents are rational and honest, always maximizing their utility. However, this assumption does not hold for personalized policies, where agents may behave deceptively to hide their true preferences and manipulate the regulator to obtain a “deception premium,” i.e., a more favorable measure than they deserve.
Recent literature models deceptive behavior as follows: a deceptive user "selects a simulated utility" different from their true one and makes decisions to maximize it to mislead the regulator [2]. The standard assumption is that the regulator immediately knows the user’s simulated utility and provides measures accordingly. However, this assumption is overly pessimistic, as it makes misleading the regulator too easy.
In this paper, we adopt a more realistic assumption that the regulator learns the user’s utility from their observed choices. Therefore, to mislead the regulator, the user must consistently act in line with the simulated utility, sacrificing the maximization of their true utility. This results in a cost of deception, i.e., the loss from not maximizing their true utility. It remains unclear when this cost is lower than the deception premium.
Our overarching aim is to analyze and enhance the robustness of personalized incentive policies for deceptive agents. While regulators can prevent any deception by withholding incentives, this approach does not shift users' behavior toward sustainable choices and thus will not improve social welfare. On the other extreme, excessive incentives may encourage users' deceptive behavior. Hence, the question we raise is, "What is the optimal level of incentive that maximizes social welfare, along with the corresponding level of deception, if any?"
To tackle this question, we formalize the setting in a game-theoretical framework as a leader-follower game. We model the regulator's process of learning users' preferences as a Markov Chain and users' sequential decision-making as a Markov Decision Process. This formulation allows us to characterize optimal user strategies, i.e., the sequence of choices and the level of incentive below which deceptive behavior is discouraged. These characterizations can inform the design of incentive policies that limit deceptive behavior to ensure social welfare improvement.
References
[1] Y. Xie, R. Seshadri, Y. Zhang, A. Akinepally, M. E. Ben-Akiva, Real-time personalized tolling for managed lanes, Transportation Research Part C: Emerging Technologies 163 (2024) 104629.
[2] Q. Dawkins, M. Han, H. Xu, The limits of optimal pricing in the dark, Advances in Neural Information Processing Systems 34 (2021) 26649–26660. |