π SECURITY
Heretic censorship removal tool
+++ Someone built automation to strip safety guardrails from LLMs and shared it publicly, which is either bold transparency or a masterclass in not understanding information security incentives. +++
Heretic: Automatic censorship removal for language models
π¬ HackerNews Buzz: 106 comments
π LOWKEY SLAPS
π― Hyperparameter optimization β’ Censorship removal β’ Safety alignment
π¬ "Basically any time you aren't sure about the perfect value, throw Optuna on it"
β’ "Heretic is a tool that removes censorship (aka 'safety alignment') from transformer-based language models"