Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)
-
Updated
Jun 26, 2024
Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Improved version of the technical workshops for the 10-day ML4G camp on safety of AI systems
QROA: A Black-Box Query-Response Optimization Attack on LLMs
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.
用于检测图像中不良内容的深度学习模型,对输入图像进行暴力和非暴力的二分类,并通过AIGC图像、对抗样本和加噪图像进行了增强。
A curated list of awesome responsible machine learning resources.
RuLES: a benchmark for evaluating rule-following in language models
Scan your AI/ML models for problems before you put them into production.
Evaluation & testing framework for computer vision models
A Python-based toolkit for comparing transformers.
Reading list for adversarial perspective and robustness in deep reinforcement learning.
Aira is a series of chatbots developed as an experimentation playground for value alignment.
DPLL(T)-based Verification tool for DNNs
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."