|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Packages | |
it.unimi.dsi.fastutil | |
it.unimi.dsi.fastutil.booleans | Provides type-specific classes for boolean elements or keys. |
it.unimi.dsi.fastutil.bytes | Provides type-specific classes for byte elements or keys. |
it.unimi.dsi.fastutil.chars | Provides type-specific classes for character elements or keys. |
it.unimi.dsi.fastutil.doubles | Provides type-specific classes for double elements or keys. |
it.unimi.dsi.fastutil.floats | Provides type-specific classes for float elements or keys. |
it.unimi.dsi.fastutil.ints | Provides type-specific classes for integer elements or keys. |
it.unimi.dsi.fastutil.longs | Provides type-specific classes for long elements or keys. |
it.unimi.dsi.fastutil.objects | Provides type-specific classes for object elements or keys. |
it.unimi.dsi.fastutil.shorts | Provides type-specific classes for short elements or keys. |
Provides type-specific maps, sets and lists with a small memory footprint and much faster access and insertion. It is free software distributed under the GNU Lesser General Public License.
Warning:
As of 3.0, the introduction of the new type-specific lists required a tweak in the naming scheme: the type-specific version of
Collection.remove(Object)
for integers is now namedIntCollection.rem(int)
. The only really unpleasant effect is that you must userem(int)
on variables of typeIntCollection
that are not of typeIntSet
(asIntSet
reinstatesremove(int)
in its right place)—for instance,IntList
. We are sorry for this inconvenience, but there was no other way to work around the problem.The huge number of classes, moreover, required a division in subpackages. Each subpackage contains classes with a given type of keys or elements.
As of 2.60, there was a major overhaul of iterators. Now iterators must be skippable, so previous implementation of type-specific iterator interfaces (e.g.,
IntIterator
) will not work. However, new abstract classes allow to build iterator with ease by providing for free the skipping logic, and many useful static methods in Iterators allow to generate type-specific iterators wrapping standard iterators, arrays, etc. Moreover, the name of abstract classes always start with Abstract (so what was an IntAbstractComparator is now anAbstractIntComparator
). Please read the section about iterators and comparators.As of 2.52, the package name has changed from it.unimi.dsi.fastUtil to it.unimi.dsi.fastutil. Also the RPM and the jar file name have changed, so you can get a smooth transition by leaving both of them around for a while. However, you should modify your sources and erase the old package and jar file as soon as possible.
The classes of this package specialise the most useful HashSet
, HashMap
, LinkedHashSet
, LinkedHashMap
, TreeSet
, TreeMap
, IdentityHashMap
and ArrayList
classes to
versions that accept a specific kind of key or value. Moreover, the
techniques used in the available implementations are quite different:
open-addressing hash tables, threaded AVL trees, threaded red-black
trees and exclusive-or lists. Class names adhere to the general
pattern
for collections, and
for maps. Presently, possible values for
collectiontype are OpenHashSet
,
LinkedOpenHashSet
, AVLTreeSet
,
RBTreeSet
and ArrayList
, whereas possible
values for maptype are OpenHashMap
,
LinkedOpenHashMap
, AVLTreeMap
and
RBTreeMap
(of course, there are also corresponding interfaces,
which can be obtained in the obvious way).
By "type" here I mean a capitalised primitive type, Object
or Reference
. In the latter case, we
are treating objects, but their equality is established by reference
equality (that is, without invoking equals()
), similarly
to IdentityHashMap
. Of course, reference-based
classes are significantly faster.
Thus, an IntOpenHashSet
stores
integers efficiently, whereas a Long2IntAVLTreeMap
does the same for maps from
longs to integers (but the map will be sorted, tree based, and
balanced using the AVL criterion). A LongLinkedOpenHashSet
stores longs in a hash
table, but provides a predictable iteration order (the insertion
order) and access to first/last elements of the order. A Reference2ReferenceOpenHashMap
is similar to an
IdentityHashMap
.
Since there are eight primitive types in Java, and we support
reference-based containers, we get 796 (!) classes (some nonsensical
classes, such as Boolean2BooleanAVLTreeMap
, are not
generated). Many classes are generated just to mimic the hierarchy of
java.util
so to redistribute common code in a similar way.
The huge number of classes required a suitable division in
subpackages (more than anything else, to avoid crashing browsers with
a preposterous package summary). Each subpackage is characterised by
the type of elements or keys: thus, for instance IntSet
belongs to it.unimi.dsi.fastutil.ints
(the plural is required, as
int
is a keyword and cannot be used in a package
name), as well as Int2ReferenceRBTreeMap
. Note that all classes for non-primitive elements and keys are
gathered in it.unimi.dsi.fastutil.objects
.
All classes are not synchronised. If multiple threads access one of these classes concurrently, and at least one of the threads modifies it, it must be synchronised externally. Iterators will behave unpredictably in the presence of concurrent modifications. Reads, however, can be carried out concurrently.
Reference-based classes violate the Map
contract. They intentionally compare objects by reference, and do
not use the equals()
method. They should be used only
when reference-based equality is desired (for instance, if all
objects involved are canonised, as it happens with interned strings).
Linked classes do not implement completely the SortedMap
interface. They provide methods to get the
first and last element in iteration order, but any submap or subset
method will cause an UnsupportedOperationException
.
(this may change in future versions).
All maps, sets and lists in fastutil
implement their standard
counterpart interface (e.g., Map
for maps). Thus, they
can be just plugged into existing code, using the standard access methods
(of course, any attempt to use the wrong type for keys or values will
produce a ClassCastException
). However, they also provide
(whenever possible) many polymorphic versions of the most used methods that
lessen the tedious "type juggling" that is well known to Java
programmers. In doing so, they implement more stringent interfaces that
extend the standard ones (e.g., Int2IntSortedMap
or IntListIterator
).
The new interfaces add some very natural methods. Moreover, whenever possible, the object returned is type-specific, or even implements a more powerful interface.
Due to some limitations of Java (you cannot override covariantly the return value of an interface, i.e., with a method returning a more specific value), however, sometimes these features are available only by means of type casting.
More in detail:
fastutil
type you
would expect (e.g., the keys of an Int2LongSortedMap
are an IntSortedSet
and the values are a LongCollection
), but you must explicitly cast the
objects returned by keySet()
and values()
to the appropriate type.
fastutil
type you would expect, too. The standard
constructors return a SortedMap
or a SortedSet
, respectively, which can be safely cast to
the right type-specific interface. However, if you use extended
interfaces and keys are not objects, then there are new methods accepting
primitive keys and returning a type-specific sorted structure directly
(see, e.g., Int2IntSortedMap
).
iterator()
can be cast to the obvious type (e.g., CharIterator
for a CharSet
), too. There is also an
additional method returning directly a type-specific iterator
(see, e.g., intIterator()
). See below for more information on
fastutil
iterators.
fastutil
return (possibly
type-specific) bidirectional
iterators. This means that you can move back and forth among
entries, keys or values. Again, you must manually cast the
result of a call to iterator()
to BidirectionalIterator
or, even
better, to a type-specific bidirectional iterator such as
IntBidirectionalIterator
. Note that
sometimes the return value is explicitly marked to be castable
to an even more powerful type-specific list iterator.
iterator(from)
which creates a type-specific BidirectionalIterator
starting from a given
element of the domain (not necessarily in the set). See, for instance,
IntSortedSet.iterator(int)
. The method is
implemented by all type-specific sorted sets and subsets.
new ObjectOpenHashSet( new String[] { "foo", "bar" } )
There are a few quirks, however, that you should be aware of:
get()
,
put()
and remove()
methods that return a primitive
type cannot, of course, rely on returning null
to
denote the absence of a certain pair. Rather, they return a
default return value, which is set to 0 cast to the
return type (false
for booleans) at creation, but
can be changed using the defaultReturnValue()
method (see, e.g., Int2IntMap
). Note that changing the
default return value does not change anything about the data
structure; it is just a way to return a reasonably meaningful
result—it can be changed at any time. As a commodity,
even maps returning objects can use
defaultReturnValue()
(of course, in this case the
default return value is initialised to null
). A
submap or subset has an independent default return value (which
however is initialised to the default return value of the
originator).get()
and remove()
methods do not admit
polymorphic versions, as Java does not allow return-value
polymorphism. Rather, the extended interfaces introduce new
methods of the form getvaluetype()
and
removevaluetype()
. Similar problems
occur with firstKey()
, lastKey()
,
indexOf()
and so on.nexttype()
returning directly a primitive type.
And, of course, you have a type-specific version of previous()
.
Collection.toArray()
has a polymorphic version accepting a type-specific array, but there are
also explicitly typed methods
tokeytypeArray()
.IntCollection
to be named IntCollection.rem(int)
. At the risk
of creating some confusion, IntSet.remove(int)
is reintroduced
in the IntSet
interface, so
the only really unpleasant effect is that you must use rem(int)
on
variables of type IntCollection
that are not of type
IntSet
—for instance,
IntList
.
Comparator
that
require specifying both a type-specific comparator and an object-based
one; this is necessary as the type-specific version must implement Comparator
. However, to simplify the creation of type-specific
comparators there are abstract type-specific comparator classes that
implement an object-based comparator wrapping the (abstract)
type-specific one; thus, if you need to create a type-specific comparator
you just have to inherit from those classes and define the type-specific
method.
fastutil
provides type-specific iterators and
comparators. The interface of a fastutil
iterator is
slightly more powerful than that of a java.util
iterator, as
it contains a skip()
method that allows to skip over a list of elements (an
analogous
method is provided for bidirectional iterators). For objects (even
those managed by reference), the extended interface is named ObjectIterator
; it is the return type, for
instance, of ObjectCollection.objectIterator()
.
fastutil
provides also classes and methods that makes it
easy to create type-specific iterators. There are abstract versions of
each type-specific iterator and comparator that implement in the
obvious way some of the methods (see, e.g., AbstractIntIterator
or AbstractIntComparator
).
A plethora of useful static methods is also provided by Iterators
: among other things, you can
wrap arrays
and standard iterators in type-specific iterators, generate them giving an
interval of elements to be returned, concatenate
them or pour
them into a set.
fastutil
provides a wide range of abstract classes, to
help in implementing its iterfaces. They take care, for instance, of
providing wrappers for non-type-specific method calls, so that you have to
write just the (usually simpler) type-specific version. As we already
remarked, this is true even of
iterators and comparators.
The main reason behind fastutil
is performance, both in
time and in space. The relevant methods of type-specific hash maps and sets
are something like 2 to 10 times faster than those of the standard
classes. Note that performance of hash-based classes on object keys is
usually worse (from a few percent to doubled time) than that of
java.util
, because fastutil
classes do not cache hash
codes. Of course, you can try to get more speed from hash tables using a
small load factor (say, 1/2).
For tree-based classes you have two choices: AVL and red-black trees. The essential difference is that AVL trees are more balanced (their height is at most 1.44 log n), whereas red-black trees have faster deletions (but their height is at most 2 log n). So on small trees red-black trees could be faster, but on very large sets AVL trees will shine. In general, AVL trees have slightly slower updates but faster searches; however, on very large collections the smaller height may lead in fact to faster updates, too.
fastutil
reduces enormously the creation and collection of
objects. First of all, if you use the polymorphic methods and iterators no
wrapper objects have to be created. Moreover, since fastutil
uses open-addressing hashing techniques, creation and garbage collection of
hash-table entries are avoided (but tables have to be rehashed whenever
they are filled beyond the load factor). The major reduction of the number
of objects around has a definite (but very difficult to measure) impact on
the whole application (as garbage collection runs proportionally to the
number of alive objects).
Whenever possible, fastutil
tries to gain some speed by
checking for faster interfaces: for instance, the various set-theoretic
methods addAll()
, retainAll()
, ecc. check whether
their arguments are type-specific and use faster iterators and accessors
accordingly.
Since deletions in hash tables are handled simply by tagging, they are very fast per se, but they tend to slow down subsequent accesses (with respect to a table with no deleted entries). In highly dynamical situations, where entries are continuously created and deleted, unsuccessful searches may take linear time (as all entries must be probed).
A partial solution to this problem (which has no known complete
solution if you use open addressing with double
hashing—cfr. Knuth's section on hashing in the third volume of
The Art of Computer Programming) is to call the
rehash()
method, which will try to rebuild the table
remapping all keys. There are also trim()
methods that
will reduce the table size if possible.
In other words, if your application requires inextricably interleaved
insertions, deletions and queries open-addressing hash-table
implementations (and in particular fastutil
classes) are not
the right choice.
Note, however, that fastutil
implements a special
optimisation, usually not found elsewhere, that speeds up probes for
recently deleted entries. More details can be found in the documentation of
the Hash
interface.
The case of linked tables is even more problematic: the deletion of an item requires a linear probe of the links until the item is found, and thus it has potentially linear cost (however, this is not true if the deletion is performed by means of an iterator, or if you delete the last element).
To avoid memory waste, (unlinked) hash tables in
fastutil
keep no additional information about elements
(such as a list of keys). In particular, this means that enumerations
are always linear in the size of the table (rather than in the number
of keys). Usually, this would imply slower iterators. Nonetheless, the
iterator code includes a single, tight loop; moreover, it is possible
to avoid the creation of wrappers. These two facts make in practise
fastutil
iterators faster than java.util
's.
The memory footprint for a table with n keys is exactly the memory required for the related types times n, plus a overhead of n bytes to store the state of each entry. The absence of wrappers around primitive types can reduce space occupancy by several times (this applies even more to serialised data, e.g., when you save such a data structure in a file). These figures can greatly vary with your virtual machine, JVM versions, CPU etc.
More precisely, when you ask for a map that will hold n
elements with load factor 0 < f ≤ 1, p entries are
allocated, where p is first prime in Hash.PRIMES
larger than
n / f. Primes in Hash.PRIMES
are roughly multiplicatively spaced by
21/16, so you lose on average about 2% with respect to
n / f.
When the table is filled up beyond the load factor, it is rehashed to a
larger size. The growth is controlled by the growth factor, which
can be set at any time. By default, the table size is doubled (for more
information, see IntOpenHashSet
), but
you can trade speed for memory occupancy by setting a slower growth rate.
In the case of linked hash tables, there is an additional vector of p integers that is used to store link information. Each element records the next and previous element indices exclusive-or'd together. As a result, linked tables provide bidirectional iterators without having to store two pointers per entry (however, iterators starting from a given element require a linear probe to be initialised, unless the element is the last one).
Since hash codes are not cached, equality on objects is checked
first by checking equality of their hashCode()
, and then using equals()
. This turns out to increase slightly the performance, as
many classes (including String
) cache their hash
codes; in any case, the speed cannot reach that of java.util
's
hash classes, which cache hash codes.
The balanced trees implementation is also very parsimonious.
fastutil
is based on the excellent (and unbelievably well
documented) code contained in Ben Pfaff's GNU libavl, which describes in
detail how to handle balanced trees with threads. Thus, the
overhead per entry is two pointers and one integer, which compares well to
three pointers plus one boolean of the standard tree maps. The trick is
that we use the integer bit by bit, so we consume two bits to store thread
information, plus one or two bits to handle balancing. As a result, we get
bidirectional iterators in constant space and amortised constant time
without having to store references to parent nodes.
It should be mentioned that all tree-based classes have a fixed overhead for some arrays that are used as stacks to simulate recursion; in particular, we need 48 booleans for AVL trees and 64 pointers plus 64 booleans for red-black trees.
Suppose you want to store a sorted map from longs to integers. The first step is to define a variable of the right interface, and assign it a new tree map (say, of the AVL type):
Long2IntSortedMap m = new Long2IntAVLTreeMap();
Now we can easily modify and access its content:
m.put( 1, 5 ); m.put( 2, 6 ); m.put( 3, 7 ); m.put( 1000000000L, 10 ); m.get( 1 ); // This method call will return 5 m.get( 4 ); // This method call will return 0
We can also try to change the default return value:
m.defaultReturnValue( -1 ); m.get( 4 ); // This method call will return -1
By suitable type casting, we can obtain a very powerful iterator:
LongListIterator i = (LongListIterator)((LongSortedSet)m.keySet()).iterator(); // Now we sum all keys long s = 0; while( i.hasNext() ) s += i.nextLong();
If one just needs a type-specific iterator, there is a special method that avoids casting:
LongIterator i = ((LongSortedSet)m.keySet()).longIterator(); // Now we sum all keys long s = 0; while( i.hasNext() ) s += i.nextLong();
We now generate a head map, and iterate bidirectionally over it starting from a given point:
// This map contains only keys smaller than 4 Long2IntSortedMap m1 = m.headMap( 4 ); // This iterator is positioned between 2 and 3 LongBidirectionalIterator t = ((LongSortedSet)m1.keySet()).iterator( 2 ); t.previous(); // This method call will return 2 (t.next() would return 3)
Linked maps are very flexible data structures which can be used to implement, for instance, queues whose content can be probed efficiently:
// This map remembers insertion order (note that we are using the array-based constructor) IntSortedSet s = new IntLinkedOpenHashSet( new int[] { 4, 3, 2, 1 } ); s.firstInt(); // This method call will return 4 s.lastInt(); // This method call will return 1 s.contains(5); // This method will return false IntBidirectionalIterator i = s.iterator( s.lastInt() ); // We could even cast it to a list iterator i.previous(); // This method call will return 1 i.previous(); // This method call will return 2 s.remove(s.lastInt()); // This will remove the last element in constant time
Finally, we play with iterators. It is easy to create iterators over intervals or over arrays, and combine them:
IntIterator i = Iterators.fromTo( 0, 10 ); // This iterator will return 0, 1, ..., 9. int a[] = new int[] { 5, 1, 9 }; IntIterator j = Iterators.wrap( a ); // This iterator will return 5, 1, 9. IntIterator k = Iterators.concat( new IntIterator[] { i , j } ); // This iterator will return 0, 1, ..., 9, 5, 1, 9.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |