Publications

Laurène Vaugrante, Anietta Weckauff, und Thilo Hagendorff. Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment. 2026. [PUMA: IRIS iris reflection iris3d] URL