ISubGVQA.datasets.gqa
Classes
A custom dataset class for the GQA (Graph Question Answering) dataset. |
Functions
|
Builds a vocabulary from the questions in the training and validation datasets. |
|
Module Contents
- ISubGVQA.datasets.gqa.build_text_vocab(tokenizer)
Builds a vocabulary from the questions in the training and validation datasets. Args:
tokenizer (callable): A function or callable object that takes a string and returns a list of tokens.
- Returns:
vocab: A vocabulary object containing the unique tokens from the questions, with special tokens added.
- Raises:
AssertionError: If a token in the questions is not found in the vocabulary.
- Special tokens:
(“<unk>”, “<pad>”, “<sos>”, “<eos>”, “<self>”) are added to the vocabulary.
- class ISubGVQA.datasets.gqa.GQADataset(split, ans2label_path='./ISubGVQA/meta_info/trainval_ans2label.json', label2ans_path='./ISubGVQA/meta_info/trainval_label2ans.json')
Bases:
torch.utils.data.DatasetA custom dataset class for the GQA (Graph Question Answering) dataset. Attributes:
tokenizer (CLIPTokenizerFast): Tokenizer for processing text data. ans2label (dict): Dictionary mapping answers to labels. label2ans (dict): Dictionary mapping labels to answers. split (str): The dataset split, one of ‘train’, ‘valid’, or ‘testdev’. sg_feature_lookup (GQASceneGraphs): Lookup for scene graph features. data (dict): Loaded question data for the specified split. idx2sampleId (list): List of sample IDs. sg_cache (dict): Cache for scene graphs.
- Methods:
- __init__(split, ans2label_path, label2ans_path):
Initializes the dataset with the specified split and paths to answer-label mappings.
- __getitem__(idx):
Retrieves the data sample at the specified index.
- __len__():
Returns the total number of data samples.
- num_answers():
Returns the number of unique answers in the dataset.
- indices_to_string(indices, words=False):
Converts word indices to a sentence string.
- tokenizer
- ans2label
- label2ans
- split
- sg_feature_lookup
- idx2sampleId
- sg_cache
- __getitem__(idx)
- __len__()
- property num_answers
- classmethod indices_to_string(indices, words=False)
Convert word indices (torch.Tensor) to sentence (string). Args:
indices: torch.tensor or numpy.array of shape (T) or (T, 1) words: boolean, wheter return list of words
- Returns:
sentence: string type of converted sentence words: (optional) list[string] type of words list
- ISubGVQA.datasets.gqa.gqa_collate(data)