GFAParser: module to parse and write GFA format#

class gfagraphs.gfaparser.GFAParser#

This class implements static methods to get informations about the contents of a GFA file, and to parse them.

Returns

Methods are static and should be used passing arguments.

Return type

None

Raises

OSError – The file does not exists
IOError – File is empty
IOError – File descriptor is invalid
NotImplementedError – Byte-array or array is saved to GFA
ValueError – Data format not in GFA standards

static get_gfa_format(gfa_file_path: str | list[str]) → str | list[str]#

Given a file, or more, returns the gfa subtypes, and raises error if file is invalid or does not exists. Objective is to asses GFA subformat on files for pre-processing purposes, or algorithm choices.

Parameters

gfa_file_path (str | list[str]) – a series of paths, or a single one

Returns

per path, a tag identifying the gfa type

Return type

str | list[str]

Raises

OSError – Specified file does not exists
IOError – File descriptor is invalid
IOError – File is empty

static get_gfa_type(tag_type: str) → Union[type, Callable]#

Interprets tags of GFA as a Python-compatible format. Given a letter used as a tag in the GFA standard, return the type or function to cast the data to. This function is used in input scenarios, to read a file from disk and interpret its content

Parameters

tag_type (str) – a GFA tag

Returns

a cast descriptor to use on the data

Return type

type | Callable

Raises

NotImplementedError – Byte-array or array
ValueError – Type identifer is not in the GFA-spec

static get_python_type(data: object) → str#

From a python variable, tries to identify the best suiting tag, and validates it. See http://gfa-spec.github.io/GFA-spec/GFA1.html#optional-fields for more details.

Parameters: data (object) – the data we try to add to the GFA file
Returns: a one-letter code for an optional filed of the GFA-spec
Return type: str
Raises: ValueError – data type could not be encoded in the GFA-spec

static read_gfa_line(datas: list[str], load_sequence_in_memory: bool = True, regexp_pattern: str = '.*', memory_mode: bool = False) → tuple[str, gfagraphs.abstractions.GFALine, dict]#

Calls methods to parse a GFA line, accordingly to it’s fields described in the GFAspec github. Parses a single line and return the information it contains

Parameters

datas (list[str]) – the list of tab-separated elements of the GFA line.
load_sequence_in_memory (bool, optional) – if it is a node, if the sequance should be or not loaded, by default True
regexp_pattern (str, optional) – a pattern to keep for path names, by default “.*”
memory_mode (bool, optional) – if additional information should be loaded in the struct, by default True

Returns

Contains id_of_line, type_of_line, datas_of_line

Return type

tuple[str, GFALine, dict]

static save_graph(graph, output_path: str, force_format: gfagraphs.abstractions.GFAFormat | bool = False, minimal_graph: bool = False) → None#

Given a gfa Graph object, saves to a valid gfa file the Graph.

Parameters

graph (Graph) – the graph object loaded in memory
output_path (str) – a path to an existing (or not) dile on the disk
force_format (GFAFormat | bool, optional) – the output gfa subformat, by default False
minimal_graph (bool, optional) – if only mandatory tags should be kept, by default False

static save_subgraph(graph, output_path: str, nodes: set[str], force_format: gfagraphs.abstractions.GFAFormat | bool = False, minimal_graph: bool = False) → None#

Given a gfa Graph object, saves to a valid gfa file the Graph.

Parameters

graph (Graph) – the graph object loaded in memory
output_path (str) – a path to an existing (or not) dile on the disk
force_format (GFAFormat | bool, optional) – the output gfa subformat, by default False
minimal_graph (bool, optional) – if only mandatory tags should be kept, by default False

static set_gfa_type(tag_type: str) → Union[type, Callable]#

Interprets tags of GFA as a Python-compatible format. Given a letter used as a tag in the GFA standard, return the type or function to cast the data to. This function is used in output scenarios, to write a file to disk.

Parameters: tag_type (str) – a GFA tag
Returns: a cast descriptor to use on the data
Return type: type | Callable

static supplementary_datas(datas: list, length_condition: int) → dict#

Computes the optional tags of a gfa line and returns them as a dict.

Parameters

datas (list) – a list of tags and their values
length_condition (int) – the tags that are mandatory (and already processed)

Returns

interpreted tags in their right types

Return type

dict

gfagraphs.gfaparser.path_allocator(path_to_validate: str, particle: str | None = None, default_name: str = 'file', always_yes: bool = True) → str#

Checks if a file exists in this place, and arborescence exists. If not, creates the arborescence

Args:: path_to_validate (str): a string path to the file particle (str | None, optional): file extension. Defaults to None. default_name (str): a name if name is empty always_yes (bool, optional): if file shall be erased by default. Defaults to True.
Returns:: str: the path to the file, with extension