Core objects#

The core objects are used to represent the data returned by the API but also to pass data to the API.

Source objects#

BaseSource#

class wordcab.core_objects.BaseSource(filepath=None, url=None, url_headers=None)#

Base class for AudioSource and GenericSource objects. It is not meant to be used directly.

Parameters:

filepath (Optional[Union[str, Path]], optional) – Path to the local file, by default None.
url (Optional[str], optional) – URL to the remote file, by default None.
url_headers (Optional[Dict[str, str]], optional) – Headers to retrieve the file from the URL, by default None. Useful if the file requires authentication to be retrieved.

Raises:

ValueError – If neither filepath nor url are provided.
ValueError – If both filepath and url are provided.
TypeError – If filepath is not a string or a Path object.
FileNotFoundError – If filepath does not exist or is not accessible.

source#

The source type.

Type:: str

source_type#

The source type.

Type:: str

_stem#

The stem of the file.

Type:: str

_suffix#

The suffix of the file.

Type:: str

Returns:

The source object.

Return type:

BaseSource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload.

Return type:: str | bytes | Dict[str, bytes]

AudioSource#

class wordcab.core_objects.AudioSource(filepath=None, url=None, url_headers=None)#

The AudioSource object is required to create a job that uses an audio file as input.

Parameters:

filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not supported.
ValueError – If both filepath and url are provided.
TypeError – If the path is not a string or a Path object.
FileNotFoundError – If the file does not exist or is not accessible.

Examples

>>> from wordcab.core_objects import AudioSource

>>> audio_source = AudioSource(filepath="path/to/audio/file.mp3")  
>>> audio_source  
AudioSource(...)

Returns:

The audio source object.

Return type:

AudioSource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: dict

prepare_payload()#

Prepare payload for API request.

Return type:: Dict[str, bytes]

GenericSource#

class wordcab.core_objects.GenericSource(filepath=None, url=None, url_headers=None)#

Generic source object.

The GenericSource object is required to create a job that uses a generic file as input, such as .txt or .json file.

Parameters:

filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not supported.
ValueError – If both filepath and url are provided.
TypeError – If the path is not a string or a Path object.
FileNotFoundError – If the file does not exist or is not accessible.

Examples

>>> from wordcab.core_objects import GenericSource

>>> generic_source = GenericSource(filepath="path/to/generic/file.txt")  
>>> generic_source  
GenericSource(...)
>>> generic_source.file_object  
b'Hello, world!'
>>> generic_source.source_type  
'local'
>>> generic_source._suffix  
'.txt'
>>> generic_source._stem  
'file'

Returns:

The generic source object.

Return type:

GenericSource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: str

InMemorySource#

class wordcab.core_objects.InMemorySource(obj=None)#

In-memory source object.

The in-memory source object is a special case of the generic source object. It is used to pass a pre-processed transcript to the API.

Parameters:

obj (Union[Dict[str, List[str]], List[str]]) – The in-memory object. It can be a list of strings or a dict with a transcript key and a list of strings as value.

Raises:

ValueError – If the in-memory object does not have a transcript key.
TypeError – If the in-memory object does not have a list as value for the transcript key.
TypeError – If the in-memory object is not a list or a dict.

Examples

>>> from wordcab.core_objects import InMemorySource

>>> transcript = {"transcript": ["SPEAKER A: Hello.", "SPEAKER B: Hi."]}
>>> in_memory_source = InMemorySource(obj=transcript)
>>> in_memory_source
InMemorySource(...)
>>> in_memory_source.obj

Returns:: The in-memory source object.
Return type:: InMemorySource
Parameters:: obj (Dict[str, List[str]] | List[str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: str

WordcabTranscriptSource#

class wordcab.core_objects.WordcabTranscriptSource(transcript_id=None)#

Wordcab transcript source object using a Wordcab transcript ID.

Parameters:: transcript_id (str) – The Wordcab transcript ID to use as input.
Raises:: ValueError – If the transcript_id is not provided.

Examples

>>> from wordcab.core_objects import WordcabTranscriptSource

>>> wordcab_transcript_source = WordcabTranscriptSource(transcript_id="transcript_12345")
>>> wordcab_transcript_source
WordcabTranscriptSource(transcript_id=transcript_12345)

Returns:: The Wordcab transcript source object.
Return type:: WordcabTranscriptSource
Parameters:: transcript_id (str | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: None

AssemblyAISource#

class wordcab.core_objects.AssemblyAISource(filepath=None, url=None, url_headers=None)#

AssemblyAI source object using a local or remote AssemblyAI JSON file.

Parameters:

filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import AssemblyAISource

>>> assemblyai_source = AssemblyAISource(filepath="path/to/assemblyai/file.json")  
>>> assemblyai_source  
AssemblyAISource(...)
>>> assemblyai_source.source  
'assembly_ai'

Returns:

The AssemblyAI source object.

Return type:

AssemblyAISource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: str

DeepgramSource#

class wordcab.core_objects.DeepgramSource(filepath=None, url=None, url_headers=None)#

Deepgram source object using a local or remote Deepgram JSON file.

Parameters:

filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import DeepgramSource

>>> deepgram_source = DeepgramSource(filepath="path/to/deepgram/file.json")  
>>> deepgram_source  
DeepgramSource(...)
>>> deepgram_source.source  
'deepgram'

Returns:

The Deepgram source object.

Return type:

DeepgramSource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: str

RevSource#

class wordcab.core_objects.RevSource(filepath=None, url=None, url_headers=None)#

Rev.ai source object using a local or remote Rev.ai JSON file.

Parameters:

filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import RevSource

>>> rev_source = RevSource(filepath="path/to/rev/file.json")  
>>> rev_source  
RevSource(...)
>>> rev_source.source  
'rev_ai'

Returns:

The Rev.ai source object.

Return type:

RevSource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: str

VTTSource#

class wordcab.core_objects.VTTSource(filepath=None, url=None, url_headers=None)#

VTT source object using a local or remote VTT file.

Parameters:

filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –

Raises:

ValueError – If the file format is not valid.

Examples

>>> from wordcab.core_objects import VTTSource

>>> vtt_source = VTTSource(filepath="path/to/vtt/file.vtt")  
>>> vtt_source  
VTTSource(...)
>>> vtt_source.source  
'vtt'

Returns:

The VTT source object.

Return type:

VTTSource

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

prepare_headers()#

Prepare headers for API request.

Return type:: Dict[str, str]

prepare_payload()#

Prepare payload for API request.

Return type:: bytes

SignedURLSource#

class wordcab.core_objects.SignedURLSource(filepath=None, url=None, url_headers=None)#

Signed URL source object.

Parameters:

filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –

Job objects#

class wordcab.core_objects.BaseJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None)#

Wordcab API BaseJob object.

Parameters:

display_name (str) –
job_name (str) –
source (str) –
job_status (str | None) –
metadata (Dict[str, str] | None) –
settings (JobSettings | None) –
source_lang (str | None) –
target_lang (str | None) –
tags (List[str] | None) –
time_started (str | None) –
time_completed (str | None) –
transcript_details (Dict[str, str] | None) –
transcript_id (str | None) –

job_update(parameters)#

Update the job attributes.

Parameters:: parameters (Dict[str, str]) –
Return type:: None

class wordcab.core_objects.ExtractJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None)#

Wordcab API ExtractJob object.

Parameters:

display_name (str) –
job_name (str) –
source (str) –
job_status (str | None) –
metadata (Dict[str, str] | None) –
settings (JobSettings | None) –
source_lang (str | None) –
target_lang (str | None) –
tags (List[str] | None) –
time_started (str | None) –
time_completed (str | None) –
transcript_details (Dict[str, str] | None) –
transcript_id (str | None) –

class wordcab.core_objects.JobSettings(ephemeral_data=False, pipeline='default', only_api=True, split_long_utterances=False)#

Wordcab API Job Settings object.

Parameters:

ephemeral_data (bool | None) –
pipeline (str) –
only_api (bool | None) –
split_long_utterances (bool | None) –

class wordcab.core_objects.ListJobs(page_count, next_page, results)#

Wordcab API ListJobs object.

Parameters:

page_count (int) –
next_page (str) –
results (List[ExtractJob | SummarizeJob]) –

class wordcab.core_objects.SummarizeJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None, summary_details=None)#

Wordcab API SummarizeJob object.

Parameters:

display_name (str) –
job_name (str) –
source (str) –
job_status (str | None) –
metadata (Dict[str, str] | None) –
settings (JobSettings | None) –
source_lang (str | None) –
target_lang (str | None) –
tags (List[str] | None) –
time_started (str | None) –
time_completed (str | None) –
transcript_details (Dict[str, str] | None) –
transcript_id (str | None) –
summary_details (Dict[str, str] | None) –

Stats object#

class wordcab.core_objects.Stats(account_email, plan, monthly_request_limit, request_count, minutes_summarized, transcripts_summarized, metered_charge, min_created, max_created, tags=None)#

Stats object for the Wordcab API.

Parameters:

account_email (str) –
plan (str) –
monthly_request_limit (str) –
request_count (int) –
minutes_summarized (int) –
transcripts_summarized (int) –
metered_charge (str) –
min_created (str) –
max_created (str) –
tags (List[str] | None) –

Summary objects#

class wordcab.core_objects.BaseSummary(job_status, summary_id, display_name=None, job_name=None, process_time=None, speaker_map=None, source=None, source_lang=None, summary_type=None, summary=None, target_lang=None, transcript_id=None, time_started=None, time_completed=None)#

Summary object.

Parameters:

job_status (str) –
summary_id (str) –
display_name (str | None) –
job_name (str | None) –
process_time (str | None) –
speaker_map (Dict[str, str] | None) –
source (str | None) –
source_lang (str | None) –
summary_type (str | None) –
summary (Dict[str, Any] | None) –
target_lang (str | None) –
transcript_id (str | None) –
time_started (str | None) –
time_completed (str | None) –

get_formatted_summaries(add_context=False)#

Format the summaries in an human readable format.

Return the summaries as a dictionary in an human readable format with the summary length as key and the summaries as values.

Parameters:: add_context (bool, optional) – If True, add the context items to the summary, by default False.
Returns:: The summaries as a dictionary with the summary length as key and the summaries as values formatted in an human readable format.
Return type:: Dict[str, str]

get_summaries()#

Get the summaries as a dictionary with the summary length as key and the summaries as values.

Returns:: The summaries as a dictionnary with the summary length as key and the summaries as values. If the summary type is brief, the summaries are returned as a list of list of str, otherwise they are returned as a list of str.
Return type:: Dict[str, List[Union[str, List[str]]]]

class wordcab.core_objects.ListSummaries(page_count, next_page, results)#

List summaries object.

Parameters:

page_count (int) –
next_page (str) –
results (List[BaseSummary]) –

class wordcab.core_objects.StructuredSummary(summary, context=None, summary_html=None, end=None, end_index=None, start=None, start_index=None, timestamp_end=None, timestamp_start=None, transcript_segment=None)#

Structured summary object.

Parameters:

summary (str | Dict[str, str]) –
context (Dict[str, str | List[str] | Dict[str, str | List[str]]] | None) –
summary_html (str | Dict[str, str] | None) –
end (str | None) –
end_index (int | None) –
start (str | None) –
start_index (int | None) –
timestamp_end (int | None) –
timestamp_start (int | None) –
transcript_segment (List[Dict[str, str | int]] | None) –

Transcript objects#

class wordcab.core_objects.BaseTranscript(transcript_id, job_id_set=<factory>, summary_id_set=<factory>, transcript=<factory>, speaker_map=<factory>, question_answers=None)#

Transcript object.

Parameters:

transcript_id (str) –
job_id_set (List[str]) –
summary_id_set (List[str]) –
transcript (List[TranscriptUtterance]) –
speaker_map (Dict[str, str]) –
question_answers (List[Dict[str, str]] | None) –

update_speaker_map(speaker_map)#

Update the speaker map for the transcript.

Parameters:: speaker_map (Dict[str, str]) –
Return type:: None

class wordcab.core_objects.ListTranscripts(page_count, next_page, results)#

List transcripts object.

Parameters:

page_count (int) –
next_page (str) –
results (List[BaseTranscript]) –

class wordcab.core_objects.TranscriptUtterance(text, speaker, end=None, end_index=None, start=None, start_index=None, timestamp_end=None, timestamp_start=None)#

Transcript utterance object.

Parameters:

text (str) –
speaker (str) –
end (str | None) –
end_index (int | None) –
start (str | None) –
start_index (int | None) –
timestamp_end (int | None) –
timestamp_start (int | None) –