GSOC 2017 - Week 4 of GSoC 17
Published:
This blog is dedicated to the third week of Google Summer of Code (i.e June 24 - July 1). This week was concentrated on cross-testing and analysis of the API with some challenging tests.
Hence, such kind of technique is effective for “on-the-fly” or “zero-shot” training.
Additionally, by adding a regularizer term to force to the weight matrix to be as close to identity matrix helps in few-shot learning as well.
The datasets explored here are : AG News and Reddit
Few limitations
Article : https://maelfabien.github.io/machinelearning/NLP_5/#
This article proposes to use pre-trained models for classification in case of few-shot learning. It says there are several approaches to few shot learning in recent papers :
The article further explores method 3, using average of the embeddings to denote each class and finding closest distance during testing. Another method the article uses is to use KNNs after the embeddings to determine the label for the test class.
Article : https://aixplain.com/2021/09/23/zero-shot-learning-in-natural-language-processing/
This article uses the natural language inference (NLI) task, where given a premise and a hypothesis we say if the hypothesis is true (entailment), false (contradiction), or undetermined (neutral).
Consider the premise as the text you would like to classify and the hypothesis as a sentence with this structure: “This text is about {label_name}”, where label_name is the label you want to predict, the NLI model predicts probabilities for entailment, contradiction and neutral and that value tells if the text is classified as the provided label.
Article : https://joeddav.github.io/blog/2020/05/29/ZSL.html
This article explores three methods :
Classification as cloze task :
Pattern-Exploiting Training (PET) reformulates text classification as a cloze task. A cloze question considers a sequence which is partially masked and requires predicting the missing value(s) from the context. PET requires a human practitioner to construct several task-appropriate cloze-style templates. A pre-trained masked language model is then tasked with choosing the most likely value for the masked (blank) word from among the possible class names for each cloze sentence.
The result is a set of noisy class predictions for each data point. This process alone serves as a basic zero-shot classifier. In addition, the authors introduce a sort of knowledge distilation procedure. After generating a set of predictions from the cloze task, these predicted values are used as proxy labels on which a new classifier is trained from scratch.
Although PET significantly outperforms the other methods described here, it also makes use of data which the other approaches do not assume access to: multiple task-specific, hand-crafted cloze sentences and a large set of unlabeled data for the distilation/self-learning step.
Paper : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8693837
Paper explores few-shot transfer learning methods for text classification with lightweight word embedding based models.
Netease and Cnews are two public Chinese text classification datasets. There are 6 categories and only 4000 samples for each category in Netease. Cnews is a subset of the THUCNews dataset, a Chinese text classification dataset produced by Natural Language Processing and Computational Social Science Lab, Tsinghua University. There are 10 categories with 5000/500/1000 samples each category for train/validation/test. The samples in Netease and Cnews are long text documents with 582 and 530 tokens on average, respectively. About the English text datasets, AG News and Yahoo are two topic categorization datasets with 4 and 10 categories, respectively. The average word numbers are 43 for AG News and 104 for Yahoo. DBpedia is an ontology classification dataset with 14 categories and 57 words each sample on average. For few-shot transfer learning, M sampled classes are firstly selected and then N shot samples and Q query samples are randomly selected from each of the sampled classes, which is called M-way N-shot Q-query. For example, in 5-way 1-shot 15-query settings, there are 1 support and 15 testing samples for each of the 5 sampled classes in each evaluation episode.