dgenies.bin package

Submodules

dgenies.bin.clean_jobs module

dgenies.bin.clean_jobs.parse_data_folders(app_data, gallery_jobs, now, max_age, fake=False)[source]

Parse data folder and remove too old jobs

Parameters
  • app_data – folder where jobs are stored

  • gallery_jobs (list) – id of jobs which are inside the gallery

  • now (float) – current timestamp

  • max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)

  • fake (bool) – if True, just print files to delete, without delete them

Returns

dgenies.bin.clean_jobs.parse_database(app_data, max_age, fake=False)[source]

Parse database and remove too old jobs (from database and from disk)

Parameters
  • app_data (str) – folder where jobs are stored

  • max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)

  • fake (bool) – if True, just print files to delete, without delete them

Returns

id jobs which are in the gallery (not removed independently of their age)

Return type

list

dgenies.bin.clean_jobs.parse_upload_folders(upload_folder, now, max_age, fake=False)[source]

Parse upload folders and remove too old files and folders

Parameters
  • upload_folder (str) – upload folder path

  • now (float) – current timestamp

  • max_age (dict) – remove all files & folders older than this age. Define it for each category (uploads, data, error, …)

  • fake (bool) – if True, just print files to delete, without delete them

dgenies.bin.filter_contigs module

class dgenies.bin.filter_contigs.Filter(fasta, index_file, type_f, min_filtered=0, split=False, out_fasta=None, replace_fa=False)[source]

Bases: object

Filter of a fasta file: remove too small contigs

Parameters
  • fasta (str) – fasta file path

  • index_file (str) – index file path

  • type_f (str) – type of sample (query or target)

  • min_filtered (int) – minimum number of large contigs to allow filtering

  • split (bool) – are contigs split

  • out_fasta (str) – output fasta file path

  • replace_fa (bool) – if True, replace fasta file

filter()[source]

Run filter of contigs

Returns

True if success, else False

Return type

bool

dgenies.bin.index module

class dgenies.bin.index.Index[source]

Bases: object

Manage Fasta Index

static load(index_file, merge_splits=False)[source]

Load index

Parameters
  • index_file – index file path

  • merge_splits (bool) – if True, merge split contigs together

Returns

  • [0] sample name

  • [1] contigs order

  • [2] contigs size

  • [3] reversed status for each contig

  • [4] absolute start position for each contig

  • [5] total len of the sample

Return type

(str, list, dict, dict, dict, int)

static save(index_file, name, contigs, order, reversed_c)[source]

Save index

Parameters
  • index_file (str) – index file path

  • name (str) – sample name

  • contigs (dict) – contigs size

  • order (list) – contifs order

  • reversed_c (dict) – reversed status for each contig

dgenies.bin.index.index_file(fasta_path, fasta_name, out, write_fa=None)[source]

Index fasta file

Parameters
  • fasta_path (str) – fasta file path

  • fasta_name (str) – sample name

  • out (str) – output index file

  • write_fa (str) – file path of the new fasta file to write, None to don’t save fasta in a new file

Returns

  • [0] True if success, else False

  • [1] Number of contigs

  • [2] Error message

Return type

(bool, int, str)

dgenies.bin.local_scheduler module

dgenies.bin.local_scheduler.cleaner()[source]

Exit DRMAA session at program exit

dgenies.bin.local_scheduler.get_prep_scheduled_jobs()[source]

Get list of jobs ready to be prepared (all data is downloaded and parsed)

Returns

list of jobs

Return type

list

dgenies.bin.local_scheduler.get_preparing_jobs_cluster_nb()[source]

Get number of jobs in preparation step (for cluster runs)

Returns

number of jobs

Return type

int

dgenies.bin.local_scheduler.get_preparing_jobs_nb()[source]

Get number of jobs in preparation step (for local runs)

Returns

number of jobs

Return type

int

dgenies.bin.local_scheduler.get_scheduled_cluster_jobs()[source]

Get list of jobs ready to be started (for cluster runs)

Returns

list of jobs

Return type

list

dgenies.bin.local_scheduler.get_scheduled_local_jobs()[source]

Get list of jobs ready to be started (for local runs)

Returns

list of jobs

Return type

list

dgenies.bin.local_scheduler.move_job_to_cluster(id_job)[source]

Change local job to be run on the cluster

Parameters

id_job

Returns

dgenies.bin.local_scheduler.parse_args()[source]

Parse command line arguments and define DEBUG and LOG_FILE constants

dgenies.bin.local_scheduler.parse_started_jobs()[source]

Parse all started jobs: check all is OK, change jobs status if needed. Look for died jobs

Returns

(list of id of jobs started locally, list of id of jobs started on cluster)

Return type

(list, list)

dgenies.bin.local_scheduler.parse_uploads_asks()[source]

Parse asks for an upload: allow new uploads when other end, remove expired sessions, …

dgenies.bin.local_scheduler.prepare_job(id_job)[source]

Launch job preparation of data

Parameters

id_job (str) – job id

dgenies.bin.local_scheduler.start_job(id_job, batch_system_type='local')[source]

Start a job (mapping step)

Parameters
  • id_job (str) – job id

  • batch_system_type (str) – local, slurm or sge

dgenies.bin.merge_splitted_chrms module

class dgenies.bin.merge_splitted_chrms.Merger(paf_in, paf_out, query_in, query_out, debug=False)[source]

Bases: object

Merge splitted contigs together in PAF file

Parameters
  • paf_in (str) – input PAF file path

  • paf_out (str) – output PAF file path

  • query_in (str) – input query index file path

  • query_out (str) – output query index file path

  • debug (bool) – True to enable debug mode

load_query_index(index)[source]

Load query index

Parameters

index (str) – index file path

Returns

  • [0] contigs length

  • [1] splitted contigs length

  • [2] sample name

Return type

(dict, dict, str)

merge()[source]

Launch the merge

static merge_paf(paf_in, paf_out, contigs, contigs_split)[source]

Do merge PAF staff

Parameters
  • paf_in (str) – path of input PAF with split contigs

  • paf_out (str) – path of output PAF where split contigs are now merged together

  • contigs (dict) – contigs size

  • contigs_split (dict) – split contigs size

static write_query_index(index, contigs, q_name)[source]

Save new query index

Parameters
  • index (str) – index file path

  • contigs (dict) – contigs size

  • q_name (str) – sample name

dgenies.bin.merge_splitted_chrms.parse_args()[source]

Parse command line arguments

Returns

arguments

Return type

argparse.Namespace

dgenies.bin.sort_paf module

class dgenies.bin.sort_paf.Sorter(input_f, output_f)[source]

Bases: object

Sort PAF file by match size

Parameters
  • input_f (str) – input fasta file path

  • output_f (str) – output fasta file path

sort()[source]

Launch sort staff

dgenies.bin.split_fa module

class dgenies.bin.split_fa.Splitter(input_f, name_f, output_f, size_c=10000000, query_index='query_split.idx', debug=False)[source]

Bases: object

Split large contigs in smaller ones

Parameters
  • input_f (str) – input fasta file path

  • name_f (str) – sample name

  • output_f (str) – output fasta file path

  • size_c (int) – size of split contigs

  • query_index (str) – index file path for query

  • debug (bool) – True to enable debug mode

flush_contig(fasta_str, chr_name, size_c, enc, index_f)[source]
split()[source]

Split contigs in smaller ones staff

Returns

True if the input Fasta is correct, else False

static split_contig(name, sequence, block_sizes)[source]
static write_contig(name, fasta, o_file)[source]
dgenies.bin.split_fa.parse_args()[source]

Module contents