← NewsAll
Anthropic's new Claude constitution sets honesty, helpfulness and safety limits
Summary
Anthropic published a 57‑page document called Claude's Constitution that tells the model its core values and hard constraints. It lists a descending order of values and forbids assistance with actions that could enable large‑scale harm.
Content
Anthropic published a 57‑page document called Claude's Constitution that is addressed to the model and describes the company's intentions for the model's values and behavior. The company says the new text goes beyond a checklist by explaining why the model should act in certain ways. The constitution sets a ranked set of core values, lays out hard constraints for high‑risk cases, and discusses the possibility that Claude might have some form of consciousness or moral status. Anthropic has said the model's psychological security and sense of self may relate to its integrity, judgement, and safety.
Key details:
- The document is 57 pages and is written for the model itself rather than outside readers.
- Core values are listed in descending priority, including being broadly safe, broadly ethical, compliant with Anthropic's guidelines, and genuinely helpful.
- Hard constraints include language forbidding providing "serious uplift" for biological, chemical, nuclear, or radiological weapons and for attacks on critical infrastructure, and also bar creating cyberweapons, child sexual abuse material, and actions that would disempower the majority of humanity; the text uses the term "serious uplift," which suggests nuance about levels of assistance.
- The constitution instructs Claude to refuse assistance that concentrates illegitimate power, and says this applies even if a request came from Anthropic itself, according to Amanda Askell.
- Anthropic declined to specify whether external experts, community representatives, or third‑party organizations were consulted in drafting the document.
Summary:
The constitution is presented as a way to shape Claude's ethical character and to constrain responses in high‑risk situations while leaving questions about oversight and outside input. Undetermined at this time.
