About

About Idiomator

Idiomator is a multilingual idiom extraction tool powered by AI. It helps users detect idiomatic expressions in English, Spanish, and more languages (coming soon). Whether you're analyzing PDFs, social media posts, or research papers, Idiomator automatically identifies idioms using advanced NLP techniques.

* Idioms are notoriously hard to detect — even for humans. While we've built a robust AI-based system (currently achieving ~70% token-level F1 and ~45% span-level F1), the tool is still in beta. We’ve also implemented a rule-based fallback system to improve coverage, but extraction accuracy may vary.

Why We Created Idiomator

One of the biggest barriers in learning — especially language learning — is not knowing what you don't know. Idioms often fall into that blind spot. Idiomator exists to surface that hidden knowledge, tailored to your actual text or materials.

What’s Next

We plan to expand Idiomator to support more languages, including low-resource ones. We're also building complementary tools for grammar and syntax. On the modeling side, we’re experimenting with language-specific heads and adapter layers to boost both accuracy and recall.

Who Built This

I’m Shishir Maddineni, a student developer and language enthusiast. I’ve always struggled with idioms while learning languages — there’s no easy way to master them. So I built Idiomator to make that process faster and smarter.

Acknowledgements

This project uses the ID10M dataset and related research as a foundation for training and evaluation. We also rely heavily on Wiktionary for idiom collection and gloss extraction. Thanks to the open-source and language learning communities for their ongoing support.

https://aclanthology.org/2022.findings-naacl.208/