Anthropic published a study analyzing hundreds of thousands of real conversations from artificial intelligence (AI) to understand how models like Claude make moral judgments â building the first large-scale map of model values ââin everyday interactions.
ADVERTISING
Study Details
- Researchers analyzed more than 300.000 real (but anonymized) conversations to find and categorize 3.307 unique values ââexpressed by AI.
- They identified 5 types of values ââ(Practical, Knowledge-Related, Social, Protective, Personal), with Practical and Knowledge-Related being the most common.
- Values ââsuch as helpfulness and professionalism appeared more frequently, while ethical values ââwere more common during resistance to harmful requests.
- Claudeâs values ââalso changed based on context, such as emphasizing âhealthy boundariesâ in relationship advice versus âhuman agencyâ in discussions about AI ethics.
Why is it important
AI is increasingly shaping real-world decisions and relationships, making understanding its true values ââmore crucial than ever. This study also moves the discussion on alignment to more concrete observations, revealing that AI morals and values ââmay be more contextual and situational than a static viewpoint.
Read also