src/clustal/seq.h File Reference

#include "squid/squid.h"
#include "util.h"
Include dependency graph for seq.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  mseq_t
 structure for storing multiple sequences More...

Defines

#define SEQTYPE_UNKNOWN   kOtherSeq
#define SEQTYPE_DNA   kDNA
#define SEQTYPE_RNA   kRNA
#define SEQTYPE_PROTEIN   kAmino
#define AMINOACID_ANY   'X'
#define NUCLEOTIDE_ANY   'N'

Functions

void AliStat (mseq_t *prMSeq, bool bSampling, bool bReportAll)
 Stripped down version of squid's alistat.
void AddSeq (mseq_t **prMSeqDest_p, char *pcSeqName, char *pcSeqRes)
 Creates a new sequence entry and appends it to an existing mseq structure.
void SeqSwap (mseq_t *mseq, int i, int j)
 Swap two sequences in an mseq_t structure.
void DealignMSeq (mseq_t *mseq)
 Dealigns all sequences in mseq structure, updates the sequence length info and sets aligned to FALSE.
const char * SeqTypeToStr (int seqtype)
 convert int-encoded iSeqType to string
int ReadSequences (mseq_t *prMSeq_p, char *pcSeqFile, int iSeqType, int iSeqFmt, int iMaxNumSeq, int iMaxSeqLen)
 reads sequences from file
void NewMSeq (mseq_t **mseq)
 allocate and initialise new mseq_t
void FreeMSeq (mseq_t **mseq)
 Frees an mseq_t and it's members and zeros all members.
void CopyMSeq (mseq_t **prMSeqDest_p, mseq_t *prMSeqSrc)
 copies an mseq structure
void LogSqInfo (SQINFO *sqinfo)
 debug output of sqinfo struct
int FindSeqName (char *seqname, mseq_t *mseq)
int WriteAlignment (mseq_t *mseq, const char *aln_outfile, int msafile_format)
 Write alignment to file.
void DealignSeq (char *seq)
 Removes all gap-characters from a sequence.
void ShuffleMSeq (mseq_t *prMSeq)
 Shuffle mseq order.
void SortMSeqByLength (mseq_t *prMSeq, const char cOrder)
 Sort sequences by length.
void JoinMSeqs (mseq_t **prMSeqDest_p, mseq_t *prMSeqToAdd)
 Appends an mseq structure to an already existing one. filename will be left untouched.
bool SeqsAreAligned (mseq_t *prMSeq)
 Checks if sequences in given mseq structure are aligned. By definition this is only true, if sequences are of the same length and at least one gap was found.

Define Documentation

#define AMINOACID_ANY   'X'
#define NUCLEOTIDE_ANY   'N'
#define SEQTYPE_DNA   kDNA
#define SEQTYPE_PROTEIN   kAmino
#define SEQTYPE_RNA   kRNA
#define SEQTYPE_UNKNOWN   kOtherSeq

int-encoded sequence types. these are in sync with squid's seqtypes and only used for convenience here


Function Documentation

void AddSeq ( mseq_t **  prMSeqDest_p,
char *  pcSeqName,
char *  pcSeqRes 
)

Creates a new sequence entry and appends it to an existing mseq structure.

Parameters:
[out] prMSeqDest_p Already existing and initialised mseq structure
[in] pcSeqName sequence name of the sequence to add
[in] pcSeqRes the actual sequence (residues) to add
Note:
Don't forget to update the align and type flag if necessary!

FIXME allow adding of more features

void AliStat ( mseq_t prMSeq,
bool  bSampling,
bool  bReportAll 
)

Stripped down version of squid's alistat.

Parameters:
[in] prMSeq The alignment to analyse
[in] bSampling For many sequences: samples from pool
[in] bReportAll Report identities for all sequence pairs

Don't have to worry about sequence case because our version of PairwiseIdentity is case insensitive

mseq to squid msa

FIXME code overlap with WriteAlignment. Make it a function and take code there (contains more comments) as template

void CopyMSeq ( mseq_t **  prMSeqDest_p,
mseq_t prMSeqSrc 
)

copies an mseq structure

Parameters:
[out] prMSeqDest_p Copy of mseq structure
[in] prMSeqSrc Source mseq structure to copy
Note:
caller has to free copy by calling FreeMSeq()
void DealignMSeq ( mseq_t mseq  ) 

Dealigns all sequences in mseq structure, updates the sequence length info and sets aligned to FALSE.

Parameters:
[out] mseq The mseq structure to dealign
void DealignSeq ( char *  seq  ) 

Removes all gap-characters from a sequence.

Parameters:
[out] seq Sequence to dealign
Note:
seq will not be reallocated
int FindSeqName ( char *  seqname,
mseq_t mseq 
)
Parameters:
[in] seqname The sequence name to search for
[in] mseq The multiple sequence structure to search in
Returns:
-1 on failure, sequence index of matching name otherwise
Warning:
If sequence name happens to be used twice, only the first one will be reported back
void FreeMSeq ( mseq_t **  mseq  ) 

Frees an mseq_t and it's members and zeros all members.

Parameters:
[in] mseq mseq_to to free
Note:
use in conjunction with NewMSeq()
See also:
new_mseq
void JoinMSeqs ( mseq_t **  prMSeqDest_p,
mseq_t prMSeqToAdd 
)

Appends an mseq structure to an already existing one. filename will be left untouched.

Parameters:
[in] prMSeqDest_p MSeq structure to which to append to
[out] prMSeqToAdd MSeq structure which is to append
void LogSqInfo ( SQINFO *  sqinfo  ) 

debug output of sqinfo struct

Parameters:
[in] sqinfo Squid's SQINFO struct for a certain seqeuence
Note:
useful for debugging only
void NewMSeq ( mseq_t **  prMSeq  ) 

allocate and initialise new mseq_t

Parameters:
[out] prMSeq newly allocated and initialised mseq_t
Note:
caller has to free by calling FreeMSeq()
See also:
FreeMSeq
int ReadSequences ( mseq_t prMSeq,
char *  seqfile,
int  iSeqType,
int  iSeqFmt,
int  iMaxNumSeq,
int  iMaxSeqLen 
)

reads sequences from file

Parameters:
[out] prMSeq Multiple sequence struct. Must be preallocated. FIXME: would make more sense to allocate it here.
[in] seqfile Sequence file name. If '-' sequence will be read from stdin.
[in] iSeqType int-encoded sequence type. Set to SEQTYPE_UNKNOWN for autodetect (guessed from first sequence)
[in] iMaxNumSeq Return an error, if more than iMaxNumSeq have been read
[in] iMaxSeqLen Return an error, if a seq longer than iMaxSeqLen has been read
Returns:
0 on success, -1 on error
Note:
  • Depends heavily on squid
  • Sequence file format will be guessed
  • If supported by squid, gzipped files can be read as well.
bool SeqsAreAligned ( mseq_t prMSeq  ) 

Checks if sequences in given mseq structure are aligned. By definition this is only true, if sequences are of the same length and at least one gap was found.

Parameters:
[in] prMSeq Sequences to check
Returns:
TRUE if sequences are aligned, FALSE if not
void SeqSwap ( mseq_t prMSeq,
int  i,
int  j 
)

Swap two sequences in an mseq_t structure.

Parameters:
[out] prMSeq Multiple sequence struct
[in] i Index of first sequence
[in] j Index of seconds sequence
const char* SeqTypeToStr ( int  iSeqType  ) 

convert int-encoded iSeqType to string

Parameters:
[in] iSeqType int-encoded sequence type
Returns:
character pointer describing the sequence type
void ShuffleMSeq ( mseq_t mseq  ) 

Shuffle mseq order.

Parameters:
[out] mseq mseq structure to shuffle
void SortMSeqByLength ( mseq_t prMSeq,
const char  cOrder 
)

Sort sequences by length.

Parameters:
[out] prMSeq mseq to sort by length
[out] cOrder Sorting order. 'd' for descending, 'a' for ascending.
int WriteAlignment ( mseq_t mseq,
const char *  pcAlnOutfile,
int  outfmt 
)

Write alignment to file.

Parameters:
[in] mseq The mseq_t struct containing the aligned sequences
[in] pcAlnOutfile The name of the output file
[in] outfmt The alignment output format (defined in squid.h)
Returns:
Non-zero on error
Note:
We create a temporary squid MSA struct in here because we never use it within clustal. We might be better of using the old clustal output routines instead.
Generated on Fri Aug 31 05:32:52 2012 for Clustal Omega by  doxygen 1.6.3