Revolutionizing AI with Selective Forgetting

In an innovative step within the artificial intelligence (AI) realm, a team of computer scientists has developed a more agile and adaptable machine learning model through a process termed “periodic forgetting.” While this novel methodology is not poised to supplant the expansive models underpinning significant applications, it holds the potential to enhance our comprehension of how AI programs grasp language nuances.

Jea Kwon, an esteemed AI engineer at South Korea’s Institute for Basic Science, lauds this development as a pivotal leap forward in AI research. Traditional AI language engines rely heavily on artificial neural networks, where each neuron – essentially a mathematical function – interacts through complex calculations and signal transmissions across multiple layers. This intricate setup evolves from randomness to structured information flow during the training phase, enabling the model to acquire language skills by adjusting neural connections based on vast datasets.

https://www.wired.com/story/how-selective-forgetting-can-help-ai-learn-better/ and https://www.quantamagazine.org/how-selective-forgetting-can-help-ai-learn-better-20240228/

However, this training paradigm, especially for multilingual models, is marked by its high computational demands and limited flexibility in incorporating new languages post-training. Mikel Artetxe, a co-founder of AI startup Reka and a co-author of the study, elucidates the challenges inherent in expanding a model’s language repertoire under conventional training regimes.

To address these constraints, Artetxe and his team embarked on a groundbreaking experiment: training a neural network in one language, then deliberately erasing its foundational knowledge of word tokens in the embedding layer, subsequently retraining the model in a new language. This method surprisingly maintained the model’s proficiency, suggesting that deeper network layers harbored abstract concepts transcending specific languages, thereby facilitating the acquisition of new linguistic systems.

Yihong Chen, the lead researcher of the study, underscores the universal conceptualization of objects and ideas across languages, which the model mimics by associating high-level reasoning rather than mere word-to-word translations. This insight led to an enhancement in the model’s training methodology—periodically resetting the embedding layer, which, as Artetxe explains, accustoms the model to adaptation, simplifying the introduction of additional languages.

Their modified Roberta language model, trained with this “forgetting” technique, demonstrated a slight underperformance compared to its conventionally trained counterpart in initial accuracy tests. However, when retrained on smaller datasets and under computational constraints, the forgetting model showcased superior adaptability and language learning efficiency.

Evgenii Nikishin of Mila, a Quebec-based deep learning research center, and Benjamin Levy, a neuroscientist at the University of San Francisco, draw parallels between the model’s learning mechanism and human cognitive processes, highlighting the benefits of abstracted memory and adaptive forgetting in achieving more nuanced and flexible AI performance.

The implications of this research are vast, promising, more inclusive AI developments that can extend cutting-edge language processing capabilities to underrepresented languages like Basque, as Artetxe hopes. Chen envisions a future brimming with diverse, rapidly adaptable language models, transforming the landscape of AI research and application. This “periodic forgetting” model not only paves the way for a more comprehensive understanding of AI language learning but also for the creation of more accessible and equitable AI technologies across global linguistic landscapes.