GFAParser: module to parse and write GFA format#

class gfagraphs.gfaparser.GFAParser#

This class implements static methods to get informations about the contents of a GFA file, and to parse them.

Returns

Methods are static and should be used passing arguments.

Return type

None

Raises
  • OSError – The file does not exists

  • IOError – File is empty

  • IOError – File descriptor is invalid

  • NotImplementedError – Byte-array or array is saved to GFA

  • ValueError – Data format not in GFA standards

static get_gfa_format(gfa_file_path: str | list[str]) str | list[str]#

Given a file, or more, returns the gfa subtypes, and raises error if file is invalid or does not exists. Objective is to asses GFA subformat on files for pre-processing purposes, or algorithm choices.

Parameters

gfa_file_path (str | list[str]) – a series of paths, or a single one

Returns

per path, a tag identifying the gfa type

Return type

str | list[str]

Raises
  • OSError – Specified file does not exists

  • IOError – File descriptor is invalid

  • IOError – File is empty

static get_gfa_type(tag_type: str) Union[type, Callable]#

Interprets tags of GFA as a Python-compatible format. Given a letter used as a tag in the GFA standard, return the type or function to cast the data to. This function is used in input scenarios, to read a file from disk and interpret its content

Parameters

tag_type (str) – a GFA tag

Returns

a cast descriptor to use on the data

Return type

type | Callable

Raises
  • NotImplementedError – Byte-array or array

  • ValueError – Type identifer is not in the GFA-spec

static get_python_type(data: object) str#

From a python variable, tries to identify the best suiting tag, and validates it. See http://gfa-spec.github.io/GFA-spec/GFA1.html#optional-fields for more details.

Parameters

data (object) – the data we try to add to the GFA file

Returns

a one-letter code for an optional filed of the GFA-spec

Return type

str

Raises

ValueError – data type could not be encoded in the GFA-spec

static read_gfa_line(datas: list[str], load_sequence_in_memory: bool = True, regexp_pattern: str = '.*', memory_mode: bool = False) tuple[str, gfagraphs.abstractions.GFALine, dict]#

Calls methods to parse a GFA line, accordingly to it’s fields described in the GFAspec github. Parses a single line and return the information it contains

Parameters
  • datas (list[str]) – the list of tab-separated elements of the GFA line.

  • load_sequence_in_memory (bool, optional) – if it is a node, if the sequance should be or not loaded, by default True

  • regexp_pattern (str, optional) – a pattern to keep for path names, by default “.*”

  • memory_mode (bool, optional) – if additional information should be loaded in the struct, by default True

Returns

Contains id_of_line, type_of_line, datas_of_line

Return type

tuple[str, GFALine, dict]

static save_graph(graph, output_path: str, force_format: gfagraphs.abstractions.GFAFormat | bool = False, minimal_graph: bool = False) None#

Given a gfa Graph object, saves to a valid gfa file the Graph.

Parameters
  • graph (Graph) – the graph object loaded in memory

  • output_path (str) – a path to an existing (or not) dile on the disk

  • force_format (GFAFormat | bool, optional) – the output gfa subformat, by default False

  • minimal_graph (bool, optional) – if only mandatory tags should be kept, by default False

static save_subgraph(graph, output_path: str, nodes: set[str], force_format: gfagraphs.abstractions.GFAFormat | bool = False, minimal_graph: bool = False) None#

Given a gfa Graph object, saves to a valid gfa file the Graph.

Parameters
  • graph (Graph) – the graph object loaded in memory

  • output_path (str) – a path to an existing (or not) dile on the disk

  • force_format (GFAFormat | bool, optional) – the output gfa subformat, by default False

  • minimal_graph (bool, optional) – if only mandatory tags should be kept, by default False

static set_gfa_type(tag_type: str) Union[type, Callable]#

Interprets tags of GFA as a Python-compatible format. Given a letter used as a tag in the GFA standard, return the type or function to cast the data to. This function is used in output scenarios, to write a file to disk.

Parameters

tag_type (str) – a GFA tag

Returns

a cast descriptor to use on the data

Return type

type | Callable

static supplementary_datas(datas: list, length_condition: int) dict#

Computes the optional tags of a gfa line and returns them as a dict.

Parameters
  • datas (list) – a list of tags and their values

  • length_condition (int) – the tags that are mandatory (and already processed)

Returns

interpreted tags in their right types

Return type

dict

gfagraphs.gfaparser.path_allocator(path_to_validate: str, particle: str | None = None, default_name: str = 'file', always_yes: bool = True) str#

Checks if a file exists in this place, and arborescence exists. If not, creates the arborescence

Args:

path_to_validate (str): a string path to the file particle (str | None, optional): file extension. Defaults to None. default_name (str): a name if name is empty always_yes (bool, optional): if file shall be erased by default. Defaults to True.

Returns:

str: the path to the file, with extension