Seems like a natural fit, right? What better data set than decades of email complete with meta data?
I would think that spam filtering may have been one of mankinds first foray’s into machine learning. My Google-based email has been saving me from spam for a decade. We know that spam evolves to fight the machines that keep it down, and yet the machines seem to be winning. Certainly that’s some form of machine learning and not entirely human hand written filters.
We also have some evidence in the form of Gmail’s tabs (the Promo Tab, et al) which attempt to sort your email by type. And Inbox’s bundles which attempt to intelligently group your email types. That reeks of machine learning. Anthony Dm. has done some home-grown work in this area as well.
Today I wondered what would happen if I grabbed a bunch of unlabeled emails, put them all together in one black box and let a machine figure out what to do with them.
Google outright tells us this is the technology they employ. Although this research paper is about more fancy things like improving email search using actual behavior:
we leverage implicit feedback (namely clicks) provided by the users themselves. Using click logs as training data in a learning-to-rank setting is intriguing, since there is a vast and continuous supply of fresh training data.
There is even a company that is focusing on this entirely! Knowmail.
We’ve heard you hate email, so we fixed it!
Personalized artificial intelligence to help you focus on things that matter most, do more with less effort, and balance work and life.