Core objects#

The core objects are used to represent the data returned by the API but also to pass data to the API.

Source objects#

BaseSource#

class wordcab.core_objects.BaseSource(filepath=None, url=None, url_headers=None)#

Base class for AudioSource and GenericSource objects. It is not meant to be used directly.

Parameters:
  • filepath (Optional[Union[str, Path]], optional) – Path to the local file, by default None.

  • url (Optional[str], optional) – URL to the remote file, by default None.

  • url_headers (Optional[Dict[str, str]], optional) – Headers to retrieve the file from the URL, by default None. Useful if the file requires authentication to be retrieved.

Raises:
  • ValueError – If neither filepath nor url are provided.

  • ValueError – If both filepath and url are provided.

  • TypeError – If filepath is not a string or a Path object.

  • FileNotFoundError – If filepath does not exist or is not accessible.

source#

The source type.

Type:

str

source_type#

The source type.

Type:

str

_stem#

The stem of the file.

Type:

str

_suffix#

The suffix of the file.

Type:

str

Returns:

The source object.

Return type:

BaseSource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload.

Return type:

str | bytes | Dict[str, bytes]

AudioSource#

class wordcab.core_objects.AudioSource(filepath=None, url=None, url_headers=None)#

The AudioSource object is required to create a job that uses an audio file as input.

Parameters:
  • filepath (Union[str, Path]) – The path to the local file.

  • url (str) – The URL to the remote file.

  • url_headers (Dict[str, str] | None) –

Raises:
  • ValueError – If the file format is not supported.

  • ValueError – If both filepath and url are provided.

  • TypeError – If the path is not a string or a Path object.

  • FileNotFoundError – If the file does not exist or is not accessible.

Examples

>>> from wordcab.core_objects import AudioSource
>>> audio_source = AudioSource(filepath="path/to/audio/file.mp3")  
>>> audio_source  
AudioSource(...)
Returns:

The audio source object.

Return type:

AudioSource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

dict

prepare_payload()#

Prepare payload for API request.

Return type:

Dict[str, bytes]

GenericSource#

class wordcab.core_objects.GenericSource(filepath=None, url=None, url_headers=None)#

Generic source object.

The GenericSource object is required to create a job that uses a generic file as input, such as .txt or .json file.

Parameters:
  • filepath (Union[str, Path]) – The path to the local file.

  • url (str) – The URL to the remote file.

  • url_headers (Dict[str, str] | None) –

Raises:
  • ValueError – If the file format is not supported.

  • ValueError – If both filepath and url are provided.

  • TypeError – If the path is not a string or a Path object.

  • FileNotFoundError – If the file does not exist or is not accessible.

Examples

>>> from wordcab.core_objects import GenericSource
>>> generic_source = GenericSource(filepath="path/to/generic/file.txt")  
>>> generic_source  
GenericSource(...)
>>> generic_source.file_object  
b'Hello, world!'
>>> generic_source.source_type  
'local'
>>> generic_source._suffix  
'.txt'
>>> generic_source._stem  
'file'
Returns:

The generic source object.

Return type:

GenericSource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

str

InMemorySource#

class wordcab.core_objects.InMemorySource(obj=None)#

In-memory source object.

The in-memory source object is a special case of the generic source object. It is used to pass a pre-processed transcript to the API.

Parameters:

obj (Union[Dict[str, List[str]], List[str]]) – The in-memory object. It can be a list of strings or a dict with a transcript key and a list of strings as value.

Raises:
  • ValueError – If the in-memory object does not have a transcript key.

  • TypeError – If the in-memory object does not have a list as value for the transcript key.

  • TypeError – If the in-memory object is not a list or a dict.

Examples

>>> from wordcab.core_objects import InMemorySource
>>> transcript = {"transcript": ["SPEAKER A: Hello.", "SPEAKER B: Hi."]}
>>> in_memory_source = InMemorySource(obj=transcript)
>>> in_memory_source
InMemorySource(...)
>>> in_memory_source.obj
Returns:

The in-memory source object.

Return type:

InMemorySource

Parameters:

obj (Dict[str, List[str]] | List[str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

str

WordcabTranscriptSource#

class wordcab.core_objects.WordcabTranscriptSource(transcript_id=None)#

Wordcab transcript source object using a Wordcab transcript ID.

Parameters:

transcript_id (str) – The Wordcab transcript ID to use as input.

Raises:

ValueError – If the transcript_id is not provided.

Examples

>>> from wordcab.core_objects import WordcabTranscriptSource
>>> wordcab_transcript_source = WordcabTranscriptSource(transcript_id="transcript_12345")
>>> wordcab_transcript_source
WordcabTranscriptSource(transcript_id=transcript_12345)
Returns:

The Wordcab transcript source object.

Return type:

WordcabTranscriptSource

Parameters:

transcript_id (str | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

None

AssemblyAISource#

class wordcab.core_objects.AssemblyAISource(filepath=None, url=None, url_headers=None)#

AssemblyAI source object using a local or remote AssemblyAI JSON file.

Parameters:
  • filepath (Union[str, Path]) – The path to the local file.

  • url (str) – The URL to the remote file.

  • url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import AssemblyAISource
>>> assemblyai_source = AssemblyAISource(filepath="path/to/assemblyai/file.json")  
>>> assemblyai_source  
AssemblyAISource(...)
>>> assemblyai_source.source  
'assembly_ai'
Returns:

The AssemblyAI source object.

Return type:

AssemblyAISource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

str

DeepgramSource#

class wordcab.core_objects.DeepgramSource(filepath=None, url=None, url_headers=None)#

Deepgram source object using a local or remote Deepgram JSON file.

Parameters:
  • filepath (Union[str, Path]) – The path to the local file.

  • url (str) – The URL to the remote file.

  • url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import DeepgramSource
>>> deepgram_source = DeepgramSource(filepath="path/to/deepgram/file.json")  
>>> deepgram_source  
DeepgramSource(...)
>>> deepgram_source.source  
'deepgram'
Returns:

The Deepgram source object.

Return type:

DeepgramSource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

str

RevSource#

class wordcab.core_objects.RevSource(filepath=None, url=None, url_headers=None)#

Rev.ai source object using a local or remote Rev.ai JSON file.

Parameters:
  • filepath (Union[str, Path]) – The path to the local file.

  • url (str) – The URL to the remote file.

  • url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import RevSource
>>> rev_source = RevSource(filepath="path/to/rev/file.json")  
>>> rev_source  
RevSource(...)
>>> rev_source.source  
'rev_ai'
Returns:

The Rev.ai source object.

Return type:

RevSource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

str

VTTSource#

class wordcab.core_objects.VTTSource(filepath=None, url=None, url_headers=None)#

VTT source object using a local or remote VTT file.

Parameters:
  • filepath (Union[str, Path]) – The path to the local file.

  • url (str) – The URL to the remote file.

  • url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import VTTSource
>>> vtt_source = VTTSource(filepath="path/to/vtt/file.vtt")  
>>> vtt_source  
VTTSource(...)
>>> vtt_source.source  
'vtt'
Returns:

The VTT source object.

Return type:

VTTSource

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:

Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:

bytes

SignedURLSource#

class wordcab.core_objects.SignedURLSource(filepath=None, url=None, url_headers=None)#

Signed URL source object.

Parameters:
  • filepath (str | Path | None) –

  • url (str | None) –

  • url_headers (Dict[str, str] | None) –

Job objects#

class wordcab.core_objects.BaseJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None)#

Wordcab API BaseJob object.

Parameters:
  • display_name (str) –

  • job_name (str) –

  • source (str) –

  • job_status (str | None) –

  • metadata (Dict[str, str] | None) –

  • settings (JobSettings | None) –

  • source_lang (str | None) –

  • target_lang (str | None) –

  • tags (List[str] | None) –

  • time_started (str | None) –

  • time_completed (str | None) –

  • transcript_details (Dict[str, str] | None) –

  • transcript_id (str | None) –

job_update(parameters)#

Update the job attributes.

Parameters:

parameters (Dict[str, str]) –

Return type:

None

class wordcab.core_objects.ExtractJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None)#

Wordcab API ExtractJob object.

Parameters:
  • display_name (str) –

  • job_name (str) –

  • source (str) –

  • job_status (str | None) –

  • metadata (Dict[str, str] | None) –

  • settings (JobSettings | None) –

  • source_lang (str | None) –

  • target_lang (str | None) –

  • tags (List[str] | None) –

  • time_started (str | None) –

  • time_completed (str | None) –

  • transcript_details (Dict[str, str] | None) –

  • transcript_id (str | None) –

class wordcab.core_objects.JobSettings(ephemeral_data=False, pipeline='default', only_api=True, split_long_utterances=False)#

Wordcab API Job Settings object.

Parameters:
  • ephemeral_data (bool | None) –

  • pipeline (str) –

  • only_api (bool | None) –

  • split_long_utterances (bool | None) –

class wordcab.core_objects.ListJobs(page_count, next_page, results)#

Wordcab API ListJobs object.

Parameters:
class wordcab.core_objects.SummarizeJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None, summary_details=None)#

Wordcab API SummarizeJob object.

Parameters:
  • display_name (str) –

  • job_name (str) –

  • source (str) –

  • job_status (str | None) –

  • metadata (Dict[str, str] | None) –

  • settings (JobSettings | None) –

  • source_lang (str | None) –

  • target_lang (str | None) –

  • tags (List[str] | None) –

  • time_started (str | None) –

  • time_completed (str | None) –

  • transcript_details (Dict[str, str] | None) –

  • transcript_id (str | None) –

  • summary_details (Dict[str, str] | None) –

Stats object#

class wordcab.core_objects.Stats(account_email, plan, monthly_request_limit, request_count, minutes_summarized, transcripts_summarized, metered_charge, min_created, max_created, tags=None)#

Stats object for the Wordcab API.

Parameters:
  • account_email (str) –

  • plan (str) –

  • monthly_request_limit (str) –

  • request_count (int) –

  • minutes_summarized (int) –

  • transcripts_summarized (int) –

  • metered_charge (str) –

  • min_created (str) –

  • max_created (str) –

  • tags (List[str] | None) –

Summary objects#

class wordcab.core_objects.BaseSummary(job_status, summary_id, display_name=None, job_name=None, process_time=None, speaker_map=None, source=None, source_lang=None, summary_type=None, summary=None, target_lang=None, transcript_id=None, time_started=None, time_completed=None)#

Summary object.

Parameters:
  • job_status (str) –

  • summary_id (str) –

  • display_name (str | None) –

  • job_name (str | None) –

  • process_time (str | None) –

  • speaker_map (Dict[str, str] | None) –

  • source (str | None) –

  • source_lang (str | None) –

  • summary_type (str | None) –

  • summary (Dict[str, Any] | None) –

  • target_lang (str | None) –

  • transcript_id (str | None) –

  • time_started (str | None) –

  • time_completed (str | None) –

get_formatted_summaries(add_context=False)#

Format the summaries in an human readable format.

Return the summaries as a dictionary in an human readable format with the summary length as key and the summaries as values.

Parameters:

add_context (bool, optional) – If True, add the context items to the summary, by default False.

Returns:

The summaries as a dictionary with the summary length as key and the summaries as values formatted in an human readable format.

Return type:

Dict[str, str]

get_summaries()#

Get the summaries as a dictionary with the summary length as key and the summaries as values.

Returns:

The summaries as a dictionnary with the summary length as key and the summaries as values. If the summary type is brief, the summaries are returned as a list of list of str, otherwise they are returned as a list of str.

Return type:

Dict[str, List[Union[str, List[str]]]]

class wordcab.core_objects.ListSummaries(page_count, next_page, results)#

List summaries object.

Parameters:
  • page_count (int) –

  • next_page (str) –

  • results (List[BaseSummary]) –

class wordcab.core_objects.StructuredSummary(summary, context=None, summary_html=None, end=None, end_index=None, start=None, start_index=None, timestamp_end=None, timestamp_start=None, transcript_segment=None)#

Structured summary object.

Parameters:
  • summary (str | Dict[str, str]) –

  • context (Dict[str, str | List[str] | Dict[str, str | List[str]]] | None) –

  • summary_html (str | Dict[str, str] | None) –

  • end (str | None) –

  • end_index (int | None) –

  • start (str | None) –

  • start_index (int | None) –

  • timestamp_end (int | None) –

  • timestamp_start (int | None) –

  • transcript_segment (List[Dict[str, str | int]] | None) –

Transcript objects#

class wordcab.core_objects.BaseTranscript(transcript_id, job_id_set=<factory>, summary_id_set=<factory>, transcript=<factory>, speaker_map=<factory>, question_answers=None)#

Transcript object.

Parameters:
  • transcript_id (str) –

  • job_id_set (List[str]) –

  • summary_id_set (List[str]) –

  • transcript (List[TranscriptUtterance]) –

  • speaker_map (Dict[str, str]) –

  • question_answers (List[Dict[str, str]] | None) –

update_speaker_map(speaker_map)#

Update the speaker map for the transcript.

Parameters:

speaker_map (Dict[str, str]) –

Return type:

None

class wordcab.core_objects.ListTranscripts(page_count, next_page, results)#

List transcripts object.

Parameters:
  • page_count (int) –

  • next_page (str) –

  • results (List[BaseTranscript]) –

class wordcab.core_objects.TranscriptUtterance(text, speaker, end=None, end_index=None, start=None, start_index=None, timestamp_end=None, timestamp_start=None)#

Transcript utterance object.

Parameters:
  • text (str) –

  • speaker (str) –

  • end (str | None) –

  • end_index (int | None) –

  • start (str | None) –

  • start_index (int | None) –

  • timestamp_end (int | None) –

  • timestamp_start (int | None) –