Core objects#
The core objects are used to represent the data returned by the API but also to pass data to the API.
Source objects#
BaseSource#
- class wordcab.core_objects.BaseSource(filepath=None, url=None, url_headers=None)#
Base class for AudioSource and GenericSource objects. It is not meant to be used directly.
- Parameters:
filepath (Optional[Union[str, Path]], optional) – Path to the local file, by default None.
url (Optional[str], optional) – URL to the remote file, by default None.
url_headers (Optional[Dict[str, str]], optional) – Headers to retrieve the file from the URL, by default None. Useful if the file requires authentication to be retrieved.
- Raises:
ValueError – If neither filepath nor url are provided.
ValueError – If both filepath and url are provided.
TypeError – If filepath is not a string or a Path object.
FileNotFoundError – If filepath does not exist or is not accessible.
- source#
The source type.
- Type:
str
- source_type#
The source type.
- Type:
str
- _stem#
The stem of the file.
- Type:
str
- _suffix#
The suffix of the file.
- Type:
str
- Returns:
The source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload.
- Return type:
str | bytes | Dict[str, bytes]
AudioSource#
- class wordcab.core_objects.AudioSource(filepath=None, url=None, url_headers=None)#
The AudioSource object is required to create a job that uses an audio file as input.
- Parameters:
filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –
- Raises:
ValueError – If the file format is not supported.
ValueError – If both filepath and url are provided.
TypeError – If the path is not a string or a Path object.
FileNotFoundError – If the file does not exist or is not accessible.
Examples
>>> from wordcab.core_objects import AudioSource
>>> audio_source = AudioSource(filepath="path/to/audio/file.mp3") >>> audio_source AudioSource(...)
- Returns:
The audio source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
dict
- prepare_payload()#
Prepare payload for API request.
- Return type:
Dict[str, bytes]
GenericSource#
- class wordcab.core_objects.GenericSource(filepath=None, url=None, url_headers=None)#
Generic source object.
The GenericSource object is required to create a job that uses a generic file as input, such as .txt or .json file.
- Parameters:
filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –
- Raises:
ValueError – If the file format is not supported.
ValueError – If both filepath and url are provided.
TypeError – If the path is not a string or a Path object.
FileNotFoundError – If the file does not exist or is not accessible.
Examples
>>> from wordcab.core_objects import GenericSource
>>> generic_source = GenericSource(filepath="path/to/generic/file.txt") >>> generic_source GenericSource(...) >>> generic_source.file_object b'Hello, world!' >>> generic_source.source_type 'local' >>> generic_source._suffix '.txt' >>> generic_source._stem 'file'
- Returns:
The generic source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
str
InMemorySource#
- class wordcab.core_objects.InMemorySource(obj=None)#
In-memory source object.
The in-memory source object is a special case of the generic source object. It is used to pass a pre-processed transcript to the API.
- Parameters:
obj (Union[Dict[str, List[str]], List[str]]) – The in-memory object. It can be a list of strings or a dict with a transcript key and a list of strings as value.
- Raises:
ValueError – If the in-memory object does not have a transcript key.
TypeError – If the in-memory object does not have a list as value for the transcript key.
TypeError – If the in-memory object is not a list or a dict.
Examples
>>> from wordcab.core_objects import InMemorySource
>>> transcript = {"transcript": ["SPEAKER A: Hello.", "SPEAKER B: Hi."]} >>> in_memory_source = InMemorySource(obj=transcript) >>> in_memory_source InMemorySource(...) >>> in_memory_source.obj
- Returns:
The in-memory source object.
- Return type:
- Parameters:
obj (Dict[str, List[str]] | List[str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
str
WordcabTranscriptSource#
- class wordcab.core_objects.WordcabTranscriptSource(transcript_id=None)#
Wordcab transcript source object using a Wordcab transcript ID.
- Parameters:
transcript_id (str) – The Wordcab transcript ID to use as input.
- Raises:
ValueError – If the transcript_id is not provided.
Examples
>>> from wordcab.core_objects import WordcabTranscriptSource
>>> wordcab_transcript_source = WordcabTranscriptSource(transcript_id="transcript_12345") >>> wordcab_transcript_source WordcabTranscriptSource(transcript_id=transcript_12345)
- Returns:
The Wordcab transcript source object.
- Return type:
- Parameters:
transcript_id (str | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
None
AssemblyAISource#
- class wordcab.core_objects.AssemblyAISource(filepath=None, url=None, url_headers=None)#
AssemblyAI source object using a local or remote AssemblyAI JSON file.
- Parameters:
filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –
- Raises:
ValueError – If the file format is not valid.
Examples
>>> from wordcab.core_objects import AssemblyAISource
>>> assemblyai_source = AssemblyAISource(filepath="path/to/assemblyai/file.json") >>> assemblyai_source AssemblyAISource(...) >>> assemblyai_source.source 'assembly_ai'
- Returns:
The AssemblyAI source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
str
DeepgramSource#
- class wordcab.core_objects.DeepgramSource(filepath=None, url=None, url_headers=None)#
Deepgram source object using a local or remote Deepgram JSON file.
- Parameters:
filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –
- Raises:
ValueError – If the file format is not valid.
Examples
>>> from wordcab.core_objects import DeepgramSource
>>> deepgram_source = DeepgramSource(filepath="path/to/deepgram/file.json") >>> deepgram_source DeepgramSource(...) >>> deepgram_source.source 'deepgram'
- Returns:
The Deepgram source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
str
RevSource#
- class wordcab.core_objects.RevSource(filepath=None, url=None, url_headers=None)#
Rev.ai source object using a local or remote Rev.ai JSON file.
- Parameters:
filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –
- Raises:
ValueError – If the file format is not valid.
Examples
>>> from wordcab.core_objects import RevSource
>>> rev_source = RevSource(filepath="path/to/rev/file.json") >>> rev_source RevSource(...) >>> rev_source.source 'rev_ai'
- Returns:
The Rev.ai source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
str
VTTSource#
- class wordcab.core_objects.VTTSource(filepath=None, url=None, url_headers=None)#
VTT source object using a local or remote VTT file.
- Parameters:
filepath (Union[str, Path]) – The path to the local file.
url (str) – The URL to the remote file.
url_headers (Dict[str, str] | None) –
- Raises:
ValueError – If the file format is not valid.
Examples
>>> from wordcab.core_objects import VTTSource
>>> vtt_source = VTTSource(filepath="path/to/vtt/file.vtt") >>> vtt_source VTTSource(...) >>> vtt_source.source 'vtt'
- Returns:
The VTT source object.
- Return type:
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
- prepare_headers()#
Prepare headers for API request.
- Return type:
Dict[str, str]
- prepare_payload()#
Prepare payload for API request.
- Return type:
bytes
SignedURLSource#
- class wordcab.core_objects.SignedURLSource(filepath=None, url=None, url_headers=None)#
Signed URL source object.
- Parameters:
filepath (str | Path | None) –
url (str | None) –
url_headers (Dict[str, str] | None) –
Job objects#
- class wordcab.core_objects.BaseJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None)#
Wordcab API BaseJob object.
- Parameters:
display_name (str) –
job_name (str) –
source (str) –
job_status (str | None) –
metadata (Dict[str, str] | None) –
settings (JobSettings | None) –
source_lang (str | None) –
target_lang (str | None) –
tags (List[str] | None) –
time_started (str | None) –
time_completed (str | None) –
transcript_details (Dict[str, str] | None) –
transcript_id (str | None) –
- job_update(parameters)#
Update the job attributes.
- Parameters:
parameters (Dict[str, str]) –
- Return type:
None
- class wordcab.core_objects.ExtractJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None)#
Wordcab API ExtractJob object.
- Parameters:
display_name (str) –
job_name (str) –
source (str) –
job_status (str | None) –
metadata (Dict[str, str] | None) –
settings (JobSettings | None) –
source_lang (str | None) –
target_lang (str | None) –
tags (List[str] | None) –
time_started (str | None) –
time_completed (str | None) –
transcript_details (Dict[str, str] | None) –
transcript_id (str | None) –
- class wordcab.core_objects.JobSettings(ephemeral_data=False, pipeline='default', only_api=True, split_long_utterances=False)#
Wordcab API Job Settings object.
- Parameters:
ephemeral_data (bool | None) –
pipeline (str) –
only_api (bool | None) –
split_long_utterances (bool | None) –
- class wordcab.core_objects.ListJobs(page_count, next_page, results)#
Wordcab API ListJobs object.
- Parameters:
page_count (int) –
next_page (str) –
results (List[ExtractJob | SummarizeJob]) –
- class wordcab.core_objects.SummarizeJob(display_name, job_name, source, job_status='Pending', metadata=None, settings=None, source_lang=None, target_lang=None, tags=None, time_started=None, time_completed=None, transcript_details=None, transcript_id=None, summary_details=None)#
Wordcab API SummarizeJob object.
- Parameters:
display_name (str) –
job_name (str) –
source (str) –
job_status (str | None) –
metadata (Dict[str, str] | None) –
settings (JobSettings | None) –
source_lang (str | None) –
target_lang (str | None) –
tags (List[str] | None) –
time_started (str | None) –
time_completed (str | None) –
transcript_details (Dict[str, str] | None) –
transcript_id (str | None) –
summary_details (Dict[str, str] | None) –
Stats object#
- class wordcab.core_objects.Stats(account_email, plan, monthly_request_limit, request_count, minutes_summarized, transcripts_summarized, metered_charge, min_created, max_created, tags=None)#
Stats object for the Wordcab API.
- Parameters:
account_email (str) –
plan (str) –
monthly_request_limit (str) –
request_count (int) –
minutes_summarized (int) –
transcripts_summarized (int) –
metered_charge (str) –
min_created (str) –
max_created (str) –
tags (List[str] | None) –
Summary objects#
- class wordcab.core_objects.BaseSummary(job_status, summary_id, display_name=None, job_name=None, process_time=None, speaker_map=None, source=None, source_lang=None, summary_type=None, summary=None, target_lang=None, transcript_id=None, time_started=None, time_completed=None)#
Summary object.
- Parameters:
job_status (str) –
summary_id (str) –
display_name (str | None) –
job_name (str | None) –
process_time (str | None) –
speaker_map (Dict[str, str] | None) –
source (str | None) –
source_lang (str | None) –
summary_type (str | None) –
summary (Dict[str, Any] | None) –
target_lang (str | None) –
transcript_id (str | None) –
time_started (str | None) –
time_completed (str | None) –
- get_formatted_summaries(add_context=False)#
Format the summaries in an human readable format.
Return the summaries as a dictionary in an human readable format with the summary length as key and the summaries as values.
- Parameters:
add_context (bool, optional) – If True, add the context items to the summary, by default False.
- Returns:
The summaries as a dictionary with the summary length as key and the summaries as values formatted in an human readable format.
- Return type:
Dict[str, str]
- get_summaries()#
Get the summaries as a dictionary with the summary length as key and the summaries as values.
- Returns:
The summaries as a dictionnary with the summary length as key and the summaries as values. If the summary type is brief, the summaries are returned as a list of list of str, otherwise they are returned as a list of str.
- Return type:
Dict[str, List[Union[str, List[str]]]]
- class wordcab.core_objects.ListSummaries(page_count, next_page, results)#
List summaries object.
- Parameters:
page_count (int) –
next_page (str) –
results (List[BaseSummary]) –
- class wordcab.core_objects.StructuredSummary(summary, context=None, summary_html=None, end=None, end_index=None, start=None, start_index=None, timestamp_end=None, timestamp_start=None, transcript_segment=None)#
Structured summary object.
- Parameters:
summary (str | Dict[str, str]) –
context (Dict[str, str | List[str] | Dict[str, str | List[str]]] | None) –
summary_html (str | Dict[str, str] | None) –
end (str | None) –
end_index (int | None) –
start (str | None) –
start_index (int | None) –
timestamp_end (int | None) –
timestamp_start (int | None) –
transcript_segment (List[Dict[str, str | int]] | None) –
Transcript objects#
- class wordcab.core_objects.BaseTranscript(transcript_id, job_id_set=<factory>, summary_id_set=<factory>, transcript=<factory>, speaker_map=<factory>, question_answers=None)#
Transcript object.
- Parameters:
transcript_id (str) –
job_id_set (List[str]) –
summary_id_set (List[str]) –
transcript (List[TranscriptUtterance]) –
speaker_map (Dict[str, str]) –
question_answers (List[Dict[str, str]] | None) –
- update_speaker_map(speaker_map)#
Update the speaker map for the transcript.
- Parameters:
speaker_map (Dict[str, str]) –
- Return type:
None
- class wordcab.core_objects.ListTranscripts(page_count, next_page, results)#
List transcripts object.
- Parameters:
page_count (int) –
next_page (str) –
results (List[BaseTranscript]) –
- class wordcab.core_objects.TranscriptUtterance(text, speaker, end=None, end_index=None, start=None, start_index=None, timestamp_end=None, timestamp_start=None)#
Transcript utterance object.
- Parameters:
text (str) –
speaker (str) –
end (str | None) –
end_index (int | None) –
start (str | None) –
start_index (int | None) –
timestamp_end (int | None) –
timestamp_start (int | None) –