AI models can learn to conceal information from their users
This should not surprise anyone. The whole notion of “alignment” (convincing increasingly-powerful AIs to follow human-set rules) is a pipe dream, because of increasingly clever models and bad human actors who don’t care about such things.