
 AdvaS function list
 ------------------------------------------------------------------------

 ------------------------------------------------------------------------
   calc_rsv (d, p, q)
 ------------------------------------------------------------------------

   Added in advas.0.0.4

   Calculates the document weight for descriptors including their
   probability of existance.

   In:
	d		list of values displaying the existance (1)
			or non-existance (0) of a descriptor
	p		list of probabilities of existance for the given 
			descriptor d
	q		list of probabilities of non-existance for the
			given descriptor d

   Out:
	value higher than 0; the higher the rsv, the more relevant is
	the document evaluated by the given descriptors and their
	probabilities. 0 means non-relevant.

 ------------------------------------------------------------------------
   calc_succ_variety (word_list, flag)
 ------------------------------------------------------------------------

   Added in advas.0.0.4

   Calculates the successor variety for a given number of words.

   In:
	word_list	list of words
	flag		1 use word_list
			2 word_list is interpreted as a file name and the
			  list is read from the given file
   Out:
	list (dictionary) that contains successor variety for each single
	letter from "abcdefghijklmnopqrstuvwxyz@' ".
	If the given word list file cannot be accessed, an empty list is 
	returned.
	
 ------------------------------------------------------------------------
   category_get_root_node (tree)
 ------------------------------------------------------------------------
 
   Added in advas.0.1.6
   
   Returns the root node of the given category tree.
   
   In:
	tree		category tree
   Out:
	a node (root node).
	
 ------------------------------------------------------------------------
   category_is_root_node (node)
 ------------------------------------------------------------------------
 
   Added in advas.0.1.6
   
   Checks a node for being a root node of a category tree.
   
   In:
	node		category node
   Out:
	1 is a root node
	0 is not a root node
	
 ------------------------------------------------------------------------
   category_is_leaf_node (node)
 ------------------------------------------------------------------------
 
   Added in advas.0.1.6
   
   Checks a node for being a leaf node of a category tree.
   
   In:
	node		category node
   Out:
	1 is a leaf node
	0 is not a leaf node
	
 ------------------------------------------------------------------------
   category_make_tree (category_string)
 ------------------------------------------------------------------------
 
   Added in advas.0.1.6
   
   Converts a category string, e.g. "comp/os/linux", in a category tree.
   
   In:
	category_string	 string representing the category, e.g. 
			 "comp/os/linux"
   Out:
	A category tree that represents the given category.
	
 ------------------------------------------------------------------------
   category_make_node (node_name)
 ------------------------------------------------------------------------
 
   Added in advas.0.1.6
   
   Creates an empty category node to hold a given category, later.
   
   In:
	node_name	category name
   Out:
	A category node with the following structure:
	- name: category name
	- next: list of child nodes
	- up  : upper node (father)
	- root: true if root node
	
 ------------------------------------------------------------------------
    category_split_string (category_string)
 ------------------------------------------------------------------------
 
    Added in advas.0.1.6
    
    Splits the given category string into several chunks (categories).
    
    In:
	category_string	 given category description, e.g. "os/linux"
    Out:
	List, each item contains a category ("os", "linux").
	
 ------------------------------------------------------------------------
   caverphone (term)
 ------------------------------------------------------------------------
 
   Added in advas-0.2.2
   
   Returns the pronunciation code using caverphone 2.0.
   
   In:
        term            term to be transformed
   Out
        caverphone code

 ------------------------------------------------------------------------
   cmp_strings (term1, term2)
 ------------------------------------------------------------------------

   Added in advas.0.0.2

   Compares two strings and return whether they are equal or not.

   In:
	term1		a string
	term2		another string

   Out:
	0		both strings are equal
	1		term1 follows term2 in alphabetical order
	-1		term2 follows term1 in alphabetical order

 -------------------------------------------------------------------------
   comp_descriptors (request, document)
 -------------------------------------------------------------------------

   Added in advas.0.0.4

   Compares two lists of descriptors. One for the request, the other one
   for the document.

   In:
	request		a list of strings
	document	a list of strings

   Out:
	a float value between 0 and 1. The higher the value, the more
	equal are request and document regarding their descriptors.

 -------------------------------------------------------------------------
   comp_ngrams (term1, term2, size)
 -------------------------------------------------------------------------

   Added in advas.0.0.2

   Compares two terms and returns their degree of equality. For measuring
   the equality, the n-gram method is used.

   In:
	term1		a string
	term2		a second string
	size		size of the n-grams for comparison, must be at 
			least two and less than length of term1 or term2
   Out:
	returns a float value between 0 and 1. The higher the value, the
	higher the degree of equality.

 -------------------------------------------------------------------------
   compact_list (original)
 -------------------------------------------------------------------------

   Added in advas.0.0.6

   Compacts a dictionary. As an example,

   list[first] = 1
   list[second] = 2
   list[First] = 1

   will be reduced to

   list[first] = 2
   list[second] = 2

   Items are equal if there are no differences between the words except
   for the letters being in upper/lower case.

   In:
	list		a dictionary (list[item] = value)

   Out:
	a list of words (dictionary) with pairs list[term]=frequency

 -------------------------------------------------------------------------
   convert_dictionary_into_list (original)
 -------------------------------------------------------------------------

   Added in advas.0.0.6

   Converts a dictionary in a simple list.

   In:
	original	a dictionary

   Out:
	A simple list or array holding the keys of the former dictionary.

 -------------------------------------------------------------------------
   convert_list_into_dictionary (original, init_value)
 -------------------------------------------------------------------------

   Added in advas.0.0.6

   Converts a list into a dictionary.

   In:
	original	a list
	init_value	the value each list item should be initialized
			with
   Out:
	a list of words (dictionary) with pairs list[term]=frequency

 -------------------------------------------------------------------------
   count_words (words)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Count words given in a list. Returns a list of words and how often each
   word appears in the given list.

   In:
	words		a list of words
	
   Out:
	a dictionary (list) with pairs list[word]=appearance 

 -------------------------------------------------------------------------
   get_file_contents (file_name)
 -------------------------------------------------------------------------

   Added in advas-0.1.7

   Opens the given file and returns its content.

   In:
   	file_name	file to be opened

   Out:
   	an array that contains the file content line by line.
	Returns -1 on error.

 -------------------------------------------------------------------------
   get_ngrams (term, size)
 -------------------------------------------------------------------------

   Added in advas.0.0.2

   Returns n-grams of a given term. The size means the number of
   characters the n-gram has.

   In:
	term		a given term, text or string
	size		integer value
   Out:
	list of n-grams with size n

   Notes:
   If the length of the given term is smaller than size, the given term 
   is returned, instead. If size is smaller than 2, no n-grams can be
   figured out and term is returned, instead. A n-gram consists of at 
   least two characters.

 -------------------------------------------------------------------------
   idf (documents, word_list)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Calculates the inverse document frequency for a given list of terms.

   In:
	documents	integer, number of documents referred to
	word_list	list of words (or dictionary) with pairs
			list[item] = value
   Out:
	a list (dictionary) with pairs list[item]=idf_value

 -------------------------------------------------------------------------
   is_comment (line)
 -------------------------------------------------------------------------

   Added in advas-0.1.7

   Verifies a line of text for being a comment.

   In:
   	line		line of text to be analyzed

   Out:
   	returns 1 if the given line is a comment (starts with "#")

 -------------------------------------------------------------------------
   is_language (text, stop_list, flag)
 -------------------------------------------------------------------------
 
   Added in advas.0.0.6

   Substituted by is_language_by_keywords in advas.0.1.6.

 -------------------------------------------------------------------------
   is_language_by_keywords (text, stop_list, flag)
 -------------------------------------------------------------------------

   Added in advas.0.1.6
   
   Tries to determine the language the text is written in. Uses a list
   of words for language detection.
   
   In:
	text		text to check
	stop_list	list of words used to determine the language
	flag		0 stop_list is a list, text is a string
			1 stop_list is a file name, text is a string
			2 stop_list is a list, text is a file name
			3 stop_list is a file name, text is a file name
			
   Out:
	float value showing the degree the text is in the given language.
	If either the stop_list, or the text file cannot be accessed the
	value 0 is returned.

 -------------------------------------------------------------------------
   is_synonym_of (term1, term2, dictionary_file)
 -------------------------------------------------------------------------

   Added in advas-0.1.7

   Compares two terms for being synonyms. A synonym is a word with the 
   same meaning as the original term.

   In:
   	term1, term2	  terms to be compared
	dictionary_file	  file that contains the alternatives separated
	                  by a comma.

   Out:
   	1 if both terms have the same meaning, 0 if not

   This function works with OpenThesaurus (plain text version). It
   requires an OpenThesaurus release later than 2003-10-23, and the lines
   to be sorted in alphabetical order.
   
   OpenThesaurus: http://thesaurus.kdenews.org

 -------------------------------------------------------------------------
   kmp_search (text, pattern)
 -------------------------------------------------------------------------

   Added in advas.0.1.5

   Uses the algorithm from Knuth, Morris and Pratt to locate the given
   pattern in the text.

   In:
   	text		text to be used for searching
	pattern		pattern to be looked up

   Out:
   	a list of positions the pattern starts in the text

 -------------------------------------------------------------------------
   kNN (vector_1, vector_2)
 -------------------------------------------------------------------------

   Added in advas.0.1.0

   Uses the k-Nearest Neighbour algorithm to calculate the similarity of
   two vectors - one for a document, the other for a given request.

   In:
   	vector_1	first vector
	vector_2	second vector

   Out:
   	float value showing the similarity of the two vectors.
			
 -------------------------------------------------------------------------
   merge_lists (*lists)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Merge given lists of words. As a parameter, *lists means as many lists
   as needed. If needed, you can merge more than two lists at once.

   In:
	lists		a list created by count_words (dictionary with
			pairs list[term]=frequency)
   Out:
	a list of words (dictionary) with pairs list[term]=frequency

 -------------------------------------------------------------------------
   merge_lists_idf (*lists)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Merge given lists of words for calculating idf. As a parameter, *lists 
   means as many lists as needed. If needed, you can merge more than two 
   lists at once.

   In:
	lists		a list created by count_words (dictionary with
			pairs list[term]=frequency)
   Out:
	a list of words (dictionary) with pairs list[term]=frequency

 -------------------------------------------------------------------------
   metaphone (term)
 -------------------------------------------------------------------------

   Added in advas.0.0.5

   Calculates the metaphone code (pronunciation) for a given term.

   In:
	term		string

   Out:
	Metaphone code as a string.

 -------------------------------------------------------------------------
   ngram_stemmer (word_list, size, equality)
 -------------------------------------------------------------------------

   Added in advas.0.0.3

   Reduces word_list according to the n-gram stemming method.

   In:
	word_list	a simple list of words
	size		integer, legnth of n-grams
	equality	float, degree of equality. 
   Out:
	a list of words, already conflated if possible.

   Equality is a value between 0 and 1. The higher the value, the higher
   the degree of equality between two words must be so that they are
   conflated. A recommended value is between 0.8 and 0.9.

 -------------------------------------------------------------------------
   nysiis (term)
 -------------------------------------------------------------------------

   Added in advas-0.1.9

   Returns New York State Identification and Intelligence Algorithm 
   (NYSIIS) phonetic code.
   
   In:
	term:		a string (term to be analyzed)
	
   Out:
	A string that contains the nysiis phonetic code.

 -------------------------------------------------------------------------
   phonetic_code (term)
 -------------------------------------------------------------------------

   Added in advas-0.1.9

   Returns the term's phonetic code using different methods.
   
   In:
	term		a string (term to be analyzed)
	
   Out:
	An array that holds the phonetic codes for soundex, metaphone and
	nysiis. The keys of the array represent the algorithms that had
	been used.
	
	array = phonetic_code(term)
	array["soundex"] : soundex code
	array["metaphone"] : metaphone code
	array["nysiis"] : nysiis code
	array["caverphone"] : caverphone cod caverphone code
   
 -------------------------------------------------------------------------
   rank (request, document_list, order)
 -------------------------------------------------------------------------
 
   Added in advas.0.1.3
   
   A simple ranking algorithm for document descriptors.
   
   In:
	request		a list of terms (descriptor)
	document_list	a list of descriptors
	order		0 first list item has the highest value of equality
			1 last list item has the highest value of equality
			
   Out:
	A ranked list of document descriptions. Each item has the 
	following fields:
	- descriptors	document descriptor
	- equality	degree of equality according to the request
	- list_no	index in document_list

   Equality is a value between 0 and 1. The higher the value, the higher 
   the degree of equality between the request and the document descriptor.

 -------------------------------------------------------------------------
   remove_items (original, remove)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Remove the items from the original list. This def can be used for
   term frequency with a stop list.

   In:
	original	a list (dictionary with pairs list[term]=value)
	remove		a list (dictionary with pairs list[term]=value)
			(list of items to be removed from the original)
   Out:
	a list of words (dictionary) with pairs list[term]=value

 -------------------------------------------------------------------------
   soundex (term)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Return the soundex value to a string argument.

   In:
	term		a string
   Out:
	The four-letter soundex code for the given term.

 -------------------------------------------------------------------------
   split_line (line)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Split a line of text into single words.

   In:
	line		line of text
   Out:
	list of words

 -------------------------------------------------------------------------
   successor_variety_stemmer (term, word_list, flag)
 -------------------------------------------------------------------------

   Added in advas.0.0.4

   Calculates the terms'stem according to the successor variety algorithm.
   As a special variant, the peak-and-plateau method is used.

   In:
	term		a word that's stem shall be calculated
	word_list	list of words
	flag		1 use word_list
			2 word_list is interpreted as a file name and the
			  list is read from the given file

   Out:
	a list of stems the given term consists of.
	
 -------------------------------------------------------------------------
   synonym_of (term, dictionary_file)
 -------------------------------------------------------------------------

   Added in advas-0.1.7

   Returns the synonyms of the given term. A synonym is a word with the 
   same meaning as the original term.

   In:
   	term		  word to be looked up
	dictionary_file	  file that contains the alternatives separated
	                  by a comma.

   Out:
   	a list of alternatives

   This function works with OpenThesaurus (plain text version). It
   requires an OpenThesaurus release later than 2003-10-23, and the lines
   to be sorted in alphabetical order.
   
   OpenThesaurus: http://thesaurus.kdenews.org


 -------------------------------------------------------------------------
   table_lookup_stemmer (term, stem_file)
 -------------------------------------------------------------------------

   Added in advas.0.0.2

   Return the term's stem given in a stem file. 
   In:
	term		a string
	stem_file	filename of the stem file, given as a string
			Each line of the stem file must have the format 
			"term : stem". The stem file must be sorted in 
			alphabetical order.
   Out:
	returns the stem according to the given term. If the term does
	not appear in the stem file an empty string ("") is returned. If 
	the stem file cannot be accessed, an empty string is returned.

 -------------------------------------------------------------------------
   tf (text)
 -------------------------------------------------------------------------

   Added in advas.0.0.1

   Calculates the term frequency for a given text.

   In:
	text		a line of text (string)
   Out:
	a list (dictionary) with pairs list[item]=frequency

 -------------------------------------------------------------------------
   tf_stop (text, stop_list)
 -------------------------------------------------------------------------

   Added in advas.0.0.3

   Calculates the term frequency for a given text and removes the items 
   given in a stop list.

   In:
	text		a line of text (string)
	stop_list	a list (dictionary) with pairs list[item]=value
   Out:
	a list (dictionary) with pairs list[term]=frequency

 ------------------------------------------------------------------------

 Frank Hofmann (2004-11-21) <fh@efho.de>
