ISubGVQA.datasets.gqa ===================== .. py:module:: ISubGVQA.datasets.gqa Classes ------- .. autoapisummary:: ISubGVQA.datasets.gqa.GQADataset Functions --------- .. autoapisummary:: ISubGVQA.datasets.gqa.build_text_vocab ISubGVQA.datasets.gqa.gqa_collate Module Contents --------------- .. py:function:: build_text_vocab(tokenizer) Builds a vocabulary from the questions in the training and validation datasets. Args: tokenizer (callable): A function or callable object that takes a string and returns a list of tokens. Returns: vocab: A vocabulary object containing the unique tokens from the questions, with special tokens added. Raises: AssertionError: If a token in the questions is not found in the vocabulary. Special tokens: ("", "", "", "", "") are added to the vocabulary. .. py:class:: GQADataset(split, ans2label_path='./ISubGVQA/meta_info/trainval_ans2label.json', label2ans_path='./ISubGVQA/meta_info/trainval_label2ans.json') Bases: :py:obj:`torch.utils.data.Dataset` A custom dataset class for the GQA (Graph Question Answering) dataset. Attributes: tokenizer (CLIPTokenizerFast): Tokenizer for processing text data. ans2label (dict): Dictionary mapping answers to labels. label2ans (dict): Dictionary mapping labels to answers. split (str): The dataset split, one of 'train', 'valid', or 'testdev'. sg_feature_lookup (GQASceneGraphs): Lookup for scene graph features. data (dict): Loaded question data for the specified split. idx2sampleId (list): List of sample IDs. sg_cache (dict): Cache for scene graphs. Methods: __init__(split, ans2label_path, label2ans_path): Initializes the dataset with the specified split and paths to answer-label mappings. __getitem__(idx): Retrieves the data sample at the specified index. __len__(): Returns the total number of data samples. num_answers(): Returns the number of unique answers in the dataset. indices_to_string(indices, words=False): Converts word indices to a sentence string. .. py:attribute:: tokenizer .. py:attribute:: ans2label .. py:attribute:: label2ans .. py:attribute:: split .. py:attribute:: sg_feature_lookup .. py:attribute:: idx2sampleId .. py:attribute:: sg_cache .. py:method:: __getitem__(idx) .. py:method:: __len__() .. py:property:: num_answers .. py:method:: indices_to_string(indices, words=False) :classmethod: Convert word indices (torch.Tensor) to sentence (string). Args: indices: torch.tensor or numpy.array of shape (T) or (T, 1) words: boolean, wheter return list of words Returns: sentence: string type of converted sentence words: (optional) list[string] type of words list .. py:function:: gqa_collate(data)