Leptonica  1.83.1
Image processing and image analysis suite
sarray2.c File Reference
#include <string.h>
#include "allheaders.h"
#include "array_internal.h"

Go to the source code of this file.

Functions

SARRAYsarraySort (SARRAY *saout, SARRAY *sain, l_int32 sortorder)
 
SARRAYsarraySortByIndex (SARRAY *sain, NUMA *naindex)
 
l_int32 stringCompareLexical (const char *str1, const char *str2)
 
L_ASETl_asetCreateFromSarray (SARRAY *sa)
 
l_ok sarrayRemoveDupsByAset (SARRAY *sas, SARRAY **psad)
 
l_ok sarrayUnionByAset (SARRAY *sa1, SARRAY *sa2, SARRAY **psad)
 
l_ok sarrayIntersectionByAset (SARRAY *sa1, SARRAY *sa2, SARRAY **psad)
 
L_HASHMAPl_hmapCreateFromSarray (SARRAY *sa)
 
l_ok sarrayRemoveDupsByHmap (SARRAY *sas, SARRAY **psad, L_HASHMAP **phmap)
 
l_ok sarrayUnionByHmap (SARRAY *sa1, SARRAY *sa2, SARRAY **psad)
 
l_ok sarrayIntersectionByHmap (SARRAY *sa1, SARRAY *sa2, SARRAY **psad)
 
SARRAYsarrayGenerateIntegers (l_int32 n)
 
l_ok sarrayLookupCSKV (SARRAY *sa, const char *keystring, char **pvalstring)
 

Detailed Description


     Sort
         SARRAY     *sarraySort()
         SARRAY     *sarraySortByIndex()
         l_int32     stringCompareLexical()

     Set operations using aset (rbtree)
         L_ASET     *l_asetCreateFromSarray()
         l_int32     sarrayRemoveDupsByAset()
         l_int32     sarrayUnionByAset()
         l_int32     sarrayIntersectionByAset()

     Hashmap operations
         L_HASHMAP  *l_hmapCreateFromSarray()
         l_int32     sarrayRemoveDupsByHmap()
         l_int32     sarrayUnionByHmap()
         l_int32     sarrayIntersectionByHmap()

     Miscellaneous operations
         SARRAY     *sarrayGenerateIntegers()
         l_int32     sarrayLookupCSKV()


We have two implementations of set operations on an array of strings:

  (1) Using an underlying tree (rbtree).
      This uses a good 64 bit hashing function for the key,
      that is not expected to have hash collisions (and we do
      not test for them).  The tree is built up of the hash
      values, and if the hash is found in the tree, it is
      assumed that the string has already been found.

  (2) Building a hashmap from the keys (hashmap).
      This uses a fast 64 bit hashing function for the key, which
      is then hashed into a hashtable.  Collisions of hashkeys are
      very rare, but the hashtable is designed to allow more than one
      hashitem in a table entry.  The hashitems are put in a list at
      each hashtable entry, which is traversed looking for the key.

Definition in file sarray2.c.

Function Documentation

◆ l_asetCreateFromSarray()

L_ASET* l_asetCreateFromSarray ( SARRAY sa)

l_asetCreateFromSarray()

Parameters
[in]sa
Returns
set using a string hash into a uint64 as the key

Definition at line 224 of file sarray2.c.

Referenced by sarrayIntersectionByAset().

◆ l_hmapCreateFromSarray()

L_HASHMAP* l_hmapCreateFromSarray ( SARRAY sa)

l_hmapCreateFromSarray()

Parameters
[in]sainput sarray
Returns
hmap hashmap, or NULL on error

Definition at line 426 of file sarray2.c.

References l_hashStringToUint64Fast(), L_NOCOPY, sarrayGetCount(), and sarrayGetString().

Referenced by sarrayIntersectionByHmap(), and sarrayRemoveDupsByHmap().

◆ sarrayGenerateIntegers()

SARRAY* sarrayGenerateIntegers ( l_int32  n)

sarrayGenerateIntegers()

Parameters
[in]n
Returns
sa of printed numbers, 1 - n, or NULL on error

Definition at line 603 of file sarray2.c.

References L_COPY, sarrayAddString(), and sarrayCreate().

Referenced by pixaCompareInPdf().

◆ sarrayIntersectionByAset()

l_ok sarrayIntersectionByAset ( SARRAY sa1,
SARRAY sa2,
SARRAY **  psad 
)

sarrayIntersectionByAset()

Parameters
[in]sa1
[in]sa2
[out]psadintersection of the two string arrays
Returns
0 if OK; 1 on error
Notes:
     (1) Algorithm: put the larger sarray into a set, using the string
         hashes as the key values.  Then run through the smaller sarray,
         building an output sarray and a second set from the strings
         in the larger array: if a string is in the first set but
         not in the second, add the string to the output sarray and hash
         it into the second set.  The second set is required to make
         sure only one instance of each string is put into the output sarray.
         This is O(mlogn), {m,n} = sizes of {smaller,larger} input arrays.

Definition at line 369 of file sarray2.c.

References l_asetCreateFromSarray(), sarrayCreate(), and sarrayGetCount().

◆ sarrayIntersectionByHmap()

l_ok sarrayIntersectionByHmap ( SARRAY sa1,
SARRAY sa2,
SARRAY **  psad 
)

sarrayIntersectionByHmap()

Parameters
[in]sa1
[in]sa2
[out]*psadintersection of the array values
Returns
0 if OK; 1 on error

Definition at line 541 of file sarray2.c.

References L_COPY, l_hashStringToUint64Fast(), l_hmapCreateFromSarray(), L_NOCOPY, sarrayAddString(), sarrayCreate(), sarrayDestroy(), sarrayGetCount(), sarrayGetString(), and sarrayRemoveDupsByHmap().

◆ sarrayLookupCSKV()

l_ok sarrayLookupCSKV ( SARRAY sa,
const char *  keystring,
char **  pvalstring 
)

sarrayLookupCSKV()

Parameters
[in]saof strings, each being a comma-separated pair of strings, the first being a key and the second a value
[in]keystringan input string to match with each key in sa
[out]pvalstringthe returned value string corresponding to the input key string, if found; otherwise NULL
Returns
0 if OK, 1 on error
Notes:
     (1) The input sa can have other strings that are not in
         comma-separated key-value format.  These will be ignored.
     (2) This returns a copy of the first value string in sa whose
         key string matches the input keystring.
     (3) White space is not ignored; all white space before the ','
         is used for the keystring in matching.  This allows the
         key and val strings to have white space (e.g., multiple words).

Definition at line 642 of file sarray2.c.

References L_NOCOPY, sarrayCreate(), sarrayGetCount(), and sarrayGetString().

◆ sarrayRemoveDupsByAset()

l_ok sarrayRemoveDupsByAset ( SARRAY sas,
SARRAY **  psad 
)

sarrayRemoveDupsByAset()

Parameters
[in]sas
[out]psadwith duplicates removed
Returns
0 if OK; 1 on error
Notes:
     (1) This is O(nlogn), considerably slower than
         sarrayRemoveDupsByHmap() for large string arrays.
     (2) The key for each string is a 64-bit hash.
     (3) Build a set, using hashed strings as keys.  As the set is
         built, first do a find; if not found, add the key to the
         set and add the string to the output sarray.

Definition at line 266 of file sarray2.c.

Referenced by sarrayUnionByAset().

◆ sarrayRemoveDupsByHmap()

l_ok sarrayRemoveDupsByHmap ( SARRAY sas,
SARRAY **  psad,
L_HASHMAP **  phmap 
)

sarrayRemoveDupsByHmap()

Parameters
[in]sas
[out]psadhash set of unique values
[out]phmap[optional] hashmap used for lookup
Returns
0 if OK; 1 on error

Definition at line 458 of file sarray2.c.

References L_Hashmap::hashtab, L_COPY, l_hmapCreateFromSarray(), L_INSERT, L_Hashitem::next, sarrayAddString(), sarrayCreate(), sarrayGetString(), L_Hashmap::tabsize, and L_Hashitem::val.

Referenced by sarrayIntersectionByHmap(), and sarrayUnionByHmap().

◆ sarraySort()

SARRAY* sarraySort ( SARRAY saout,
SARRAY sain,
l_int32  sortorder 
)

sarraySort()

Parameters
[in]saoutoutput sarray; can be NULL or equal to sain
[in]saininput sarray
[in]sortorderL_SORT_INCREASING or L_SORT_DECREASING
Returns
saout output sarray, sorted by ascii value, or NULL on error
Notes:
     (1) Set saout = sain for in-place; otherwise, set naout = NULL.
     (2) Shell sort, modified from K&R, 2nd edition, p.62.
         Slow but simple O(n logn) sort.

Definition at line 98 of file sarray2.c.

References Sarray::array, L_SORT_DECREASING, L_SORT_INCREASING, sarrayCopy(), sarrayGetCount(), and stringCompareLexical().

Referenced by getSortedPathnamesInDirectory().

◆ sarraySortByIndex()

SARRAY* sarraySortByIndex ( SARRAY sain,
NUMA naindex 
)

sarraySortByIndex()

Parameters
[in]sain
[in]naindexna that maps from the new sarray to the input sarray
Returns
saout sorted, or NULL on error

Definition at line 146 of file sarray2.c.

References L_COPY, L_INSERT, numaGetIValue(), sarrayAddString(), sarrayCreate(), sarrayGetCount(), and sarrayGetString().

◆ sarrayUnionByAset()

l_ok sarrayUnionByAset ( SARRAY sa1,
SARRAY sa2,
SARRAY **  psad 
)

sarrayUnionByAset()

Parameters
[in]sa1
[in]sa2
[out]psadunion of the two string arrays
Returns
0 if OK; 1 on error
Notes:
     (1) Duplicates are removed from the concatenation of the two arrays.
     (2) The key for each string is a 64-bit hash.
     (2) Algorithm: Concatenate the two sarrays.  Then build a set,
         using hashed strings as keys.  As the set is built, first do
         a find; if not found, add the key to the set and add the string
         to the output sarray.  This is O(nlogn).

Definition at line 320 of file sarray2.c.

References sarrayCopy(), sarrayDestroy(), sarrayJoin(), and sarrayRemoveDupsByAset().

◆ sarrayUnionByHmap()

l_ok sarrayUnionByHmap ( SARRAY sa1,
SARRAY sa2,
SARRAY **  psad 
)

sarrayUnionByHmap()

Parameters
[in]sa1
[in]sa2
[out]*psadunion of the array values
Returns
0 if OK; 1 on error

Definition at line 507 of file sarray2.c.

References sarrayCopy(), sarrayDestroy(), sarrayJoin(), and sarrayRemoveDupsByHmap().

◆ stringCompareLexical()

l_int32 stringCompareLexical ( const char *  str1,
const char *  str2 
)

stringCompareLexical()

Parameters
[in]str1
[in]str2
Returns
1 if str1 > str2 lexically; 0 otherwise
Notes:
     (1) If the lexical values are identical, return a 0, to
         indicate that no swapping is required to sort the strings.

Definition at line 184 of file sarray2.c.

Referenced by sarraySort().