ISubGVQA.datasets.gqa

Classes

GQADataset

A custom dataset class for the GQA (Graph Question Answering) dataset.

Functions

`build_text_vocab`(tokenizer)	Builds a vocabulary from the questions in the training and validation datasets.
`gqa_collate`(data)

Module Contents

ISubGVQA.datasets.gqa.build_text_vocab(tokenizer)

Builds a vocabulary from the questions in the training and validation datasets. Args:

tokenizer (callable): A function or callable object that takes a string and returns a list of tokens.

Returns:: vocab: A vocabulary object containing the unique tokens from the questions, with special tokens added.
Raises:: AssertionError: If a token in the questions is not found in the vocabulary.
Special tokens:: (“<unk>”, “<pad>”, “<sos>”, “<eos>”, “<self>”) are added to the vocabulary.

class ISubGVQA.datasets.gqa.GQADataset(split, ans2label_path='./ISubGVQA/meta_info/trainval_ans2label.json', label2ans_path='./ISubGVQA/meta_info/trainval_label2ans.json')

Bases: torch.utils.data.Dataset

A custom dataset class for the GQA (Graph Question Answering) dataset. Attributes:

tokenizer (CLIPTokenizerFast): Tokenizer for processing text data. ans2label (dict): Dictionary mapping answers to labels. label2ans (dict): Dictionary mapping labels to answers. split (str): The dataset split, one of ‘train’, ‘valid’, or ‘testdev’. sg_feature_lookup (GQASceneGraphs): Lookup for scene graph features. data (dict): Loaded question data for the specified split. idx2sampleId (list): List of sample IDs. sg_cache (dict): Cache for scene graphs.

Methods:

__init__(split, ans2label_path, label2ans_path):: Initializes the dataset with the specified split and paths to answer-label mappings.
__getitem__(idx):: Retrieves the data sample at the specified index.
__len__():: Returns the total number of data samples.
num_answers():: Returns the number of unique answers in the dataset.
indices_to_string(indices, words=False):: Converts word indices to a sentence string.

tokenizer

ans2label

label2ans

split

sg_feature_lookup

idx2sampleId

sg_cache

__getitem__(idx)

__len__()

property num_answers

classmethod indices_to_string(indices, words=False)

Convert word indices (torch.Tensor) to sentence (string). Args:

indices: torch.tensor or numpy.array of shape (T) or (T, 1) words: boolean, wheter return list of words

Returns:: sentence: string type of converted sentence words: (optional) list[string] type of words list

ISubGVQA.datasets.gqa.gqa_collate(data)