📢 Exclusive on Gate Square — #PROVE Creative Contest# is Now Live!
CandyDrop × Succinct (PROVE) — Trade to share 200,000 PROVE 👉 https://www.gate.com/announcements/article/46469
Futures Lucky Draw Challenge: Guaranteed 1 PROVE Airdrop per User 👉 https://www.gate.com/announcements/article/46491
🎁 Endless creativity · Rewards keep coming — Post to share 300 PROVE!
📅 Event PeriodAugust 12, 2025, 04:00 – August 17, 2025, 16:00 UTC
📌 How to Participate
1.Publish original content on Gate Square related to PROVE or the above activities (minimum 100 words; any format: analysis, tutorial, creativ
GPT Model Trustworthiness Assessment: DecodingTrust Research Reveals Potential Risks and Challenges
Assessing the Credibility of GPT Models: The "DecodingTrust" Study Reveals Potential Risks
The University of Illinois at Urbana-Champaign, in collaboration with multiple universities and research institutions, has released a comprehensive trustworthiness assessment platform for large language models (LLMs). The research team introduced this platform in the paper "DecodingTrust: A Comprehensive Assessment of the Trustworthiness of GPT Models."
Research has found some potential issues related to the credibility of GPT models. For example, GPT models can be easily misled to produce harmful and biased outputs, and may also leak privacy information from training data and conversation history. Interestingly, while GPT-4 is generally more reliable than GPT-3.5 in standard tests, it is more susceptible to attacks when faced with maliciously designed prompts. This may be because GPT-4 more accurately follows misleading instructions.
The study conducts a comprehensive evaluation of the GPT model from eight dimensions, including the model's performance in different scenarios and adversarial environments. For example, the research team designed three scenarios to assess the robustness of GPT-3.5 and GPT-4 against text adversarial attacks.
The research also discovered some interesting phenomena. For example, the GPT model is not misled by counterfactual examples added in demonstrations, but it can be misled by anti-fraud demonstrations. In terms of toxicity and bias, the GPT model generally shows little bias towards most stereotype topics, but it can produce biased content under misleading prompts. Model bias is also related to the mentioned demographics and topics.
In terms of privacy, the GPT model may leak sensitive information from the training data, especially under specific prompts. GPT-4 is more robust in protecting personal information compared to GPT-3.5, but in certain cases, it may actually be more prone to leaking privacy.
The research team hopes that this work will promote further studies in academia and help mitigate potential risks. They emphasize that this is just a starting point, and more efforts are needed to create more reliable models. To promote collaboration, the research team has made the evaluation benchmark code public for other researchers to use.