ISubGVQA.datasets.gqa
=====================

.. py:module:: ISubGVQA.datasets.gqa


Classes
-------

.. autoapisummary::

   ISubGVQA.datasets.gqa.GQADataset


Functions
---------

.. autoapisummary::

   ISubGVQA.datasets.gqa.build_text_vocab
   ISubGVQA.datasets.gqa.gqa_collate


Module Contents
---------------

.. py:function:: build_text_vocab(tokenizer)

   Builds a vocabulary from the questions in the training and validation datasets.
   Args:
       tokenizer (callable): A function or callable object that takes a string and returns a list of tokens.
   Returns:
       vocab: A vocabulary object containing the unique tokens from the questions, with special tokens added.
   Raises:
       AssertionError: If a token in the questions is not found in the vocabulary.
   Special tokens:
       ("<unk>", "<pad>", "<sos>", "<eos>", "<self>") are added to the vocabulary.


.. py:class:: GQADataset(split, ans2label_path='./ISubGVQA/meta_info/trainval_ans2label.json', label2ans_path='./ISubGVQA/meta_info/trainval_label2ans.json')

   Bases: :py:obj:`torch.utils.data.Dataset`


   A custom dataset class for the GQA (Graph Question Answering) dataset.
   Attributes:
       tokenizer (CLIPTokenizerFast): Tokenizer for processing text data.
       ans2label (dict): Dictionary mapping answers to labels.
       label2ans (dict): Dictionary mapping labels to answers.
       split (str): The dataset split, one of 'train', 'valid', or 'testdev'.
       sg_feature_lookup (GQASceneGraphs): Lookup for scene graph features.
       data (dict): Loaded question data for the specified split.
       idx2sampleId (list): List of sample IDs.
       sg_cache (dict): Cache for scene graphs.
   Methods:
       __init__(split, ans2label_path, label2ans_path):
           Initializes the dataset with the specified split and paths to answer-label mappings.
       __getitem__(idx):
           Retrieves the data sample at the specified index.
       __len__():
           Returns the total number of data samples.
       num_answers():
           Returns the number of unique answers in the dataset.
       indices_to_string(indices, words=False):
           Converts word indices to a sentence string.


   .. py:attribute:: tokenizer


   .. py:attribute:: ans2label


   .. py:attribute:: label2ans


   .. py:attribute:: split


   .. py:attribute:: sg_feature_lookup


   .. py:attribute:: idx2sampleId


   .. py:attribute:: sg_cache


   .. py:method:: __getitem__(idx)


   .. py:method:: __len__()


   .. py:property:: num_answers


   .. py:method:: indices_to_string(indices, words=False)
      :classmethod:


      Convert word indices (torch.Tensor) to sentence (string).
      Args:
          indices: torch.tensor or numpy.array of shape (T) or (T, 1)
          words: boolean, wheter return list of words
      Returns:
          sentence: string type of converted sentence
          words: (optional) list[string] type of words list


.. py:function:: gqa_collate(data)