-
What is the Distributed Checksum Clearinghouse or DCC?
- See the main DCC man page
as well as the
DCC web pages
and their mirror
-
Do the fuzzy checksums ignore "personalizations"?
- Yes, they ignore many so called "personalizations".
-
How much bandwidth, disk space, and computing does the DCC require?
- The UDP packets used by a DCC client to obtain the checksum totals
from a DCC server for a mail message generally use less bandwidth than
the DNS queries required to receive the same message.
A DCC client needs very little disk space.
Bulk messages are usually logged by DCC clients.
On systems receiving a lot of mail, the mechanisms for automatically
creating new log directories every minute, day, or hour
can keep any single log directory from becoming too large.
See the dccm
and
dccproc
man pages.
As of mid-2003,
about 40 MBytes/day are exchanged between pairs of servers
participating in the network to which dcc.dcc-servers.net belongs.
Each server has 3 or 4 peers.
The resulting database is about 300 MBytes.
However, while dbclean is deleting old checksums,
there are three copies of the database.
The DCC clients and server do not need many CPU cycles,
but the daily executions of dbclean
on a system with a DCC server
require a computer with at least 512 MBytes of memory and work better with more.
DCC servers used by clients handling 100,000 or more messages per day
need to be larger.
Each additional 100,000 messages/day need about 100 MBytes of disk space
and system memory, given the default expiration of 7 days used by
dbclean -e.
On systems with more than 128 MBytes of main memory,
dccd should be configured with the
--with-db-memory
setting mentioned in the installation instructions.
In mid-2003, a DCC server prefers at least 512 MByte of RAM.
Systems dealing with more than 1,000,000 messages per day and keeping
checksums for as long as a week,
the dbclean -e default value,
should have dccd configured with the
--enable-big-db
mentioned in the installation instructions.
-
What happens to my mail if the DCC crashes?
- When in doubt or trouble, the DCC clients including
dccproc and dccm
deliver mail. They wait only a little while for a DCC server
to answer before giving up. They then avoid asking a server for a while
to avoid slowing down mail.
If the DCC sendmail interface or milter program, dccm, crashes,
the default parameters in misc/dcc.m4
for the sendmail.cf Xdcc line
tell sendmail to wait only about 30 seconds before
giving up and delivering the mail.
The DCC client code keeps track of the speeds of the
servers it knows about, and uses the fastest or closest.
Every hour or so it re-resolves A records
and checks the speeds of the servers it
is not using. When the current server stops working or gets significantly
slower, the client code switches to a better server.
-
How do I mark spam without rejecting it?
- Unless given thresholds at which to reject mail,
dccm
and
dccproc do not reject mail.
When dccm is given a threshold by setting DCCM_REJECT_AT in
dcc_conf in the DCC home directory,
DCCM_ARGS can also be set to "-a IGNORE
so that spam is marked but not rejected.
-
Why doesn't the man command find the man pages?
- The nroff source, formated nroff output, and HTML versions of the
man pages are in the top-level source directory.
Formatted or nroff source is installed by default somewhere in /usr/man
depending on the target system.
It may be necessary to add /usr/man to the MANPATH environment variable.
Even with that, SunOS 5.7 sometimes has trouble finding them unless
man -F is used.
-
Must sendmail be used with the DCC?
- While the sendmail milter interface, dccm
and the DCC program interface or dccifd
are the most efficient ways to report and check DCC checksums,
dccproc is also commonly used.
-
How can the DCC be used with qmail?
- There are comments about using dccproc with
qmail
in the
DCC mailing list archives
including Chris Shenton's
message.
See also
Chris Shenton's
DCC, qmail, and gnus page.
-
Can the DCC be used with smtpd?
- Yes, dccproc can be used with Obtuse's smtpd.
Dave Lugo has contributed a shell script to the
smtpd-sd project
which can be used to do DCC checking prior to the end of the SMTP
DATA command.
-
Can the DCC be used with Exim?
- There are comments about using Dccproc with
Exim
in the
DCC mailing list archives
including these messages:
-
Can the DCC be used with SpamAssassin or other spam filters?
- The DCC can be used with
SpamAssassin as
well as other spam and virus filters.
Note that it is more efficient to arrange to use a DCC client daemon
such as dccm to mark passing mail and check
X-DCC header lines in the filter than to start and run
dccproc on each message.
Some commercial virus and spam filters include DCC clients that
query public DCC servers or DCC servers operated by the filter vendor
and that "flood" or exchange bulk mail checksums with public servers.
-
How can the DCC be used with mail user agents?
- Dccproc can be used with any mail user
agent that can check mail headers.
For example, WD Baseley sent a
note
to the DCC
mailing list
on how to configure Eudora to
act on X-DCC header lines.
Bharat Mediratta has developed DeepSix for people using mail user agents
on UNIX boxes connected remote servers such as corporate Exchange servers.
See his
project on Sourceforge
as well as his
announcement
in the DCC mailing list.
-
Must I have the root password to use the DCC?
- No, the procmail or sendmail .forward DCC user programs
can be installed in an individual ~/bin directory.
Then cdcc
can create a private map file used with
dccproc -h dir
or
dccproc -m dir/map.
Also see the DCC installation
instructions.
-
Why don't the public DCC servers work? Do I need a client-ID?
- The public DCC servers accept requests from clients using the
anonymous client-ID.
Incorrectly configured firewalls often cause problems.
Traceroute can be used to send UDP packets to test for interfering firewalls.
See the answer to the firewall question below.
-
Which ports do I need to open in my firewall?
- DCC traffic is like DNS traffic. You should treat port 6277
like port 53.
Allow outgoing packets to distant UDP port 6277 and incoming packets
from distant UDP port 6277.
If you run a DCC server, open incoming connections to local TCP port 6277
and outgoing connections to distant TCP port 6277.
If `dccproc` fails or the command `cdcc info` says no DCC servers
are answering, you may need to adjust your firewall.
See also the discussion of Cisco ACLs at
http://www.rhyolite.com/anti-spam/dcc/firewall.html.
-
Why does the dccd database
grow without bound?
- Dbclean should be run about once a day
with a script like misc/cron-dccd.
An entry like misc/crontab can be put into
the crontab file for the user that runs dccd,
such as /var/spool/cron/crontabs/root for Solaris.
-
The dccd database is corrupt. What should I do?
- Dbclean -R
will usually repair a broken
DCC server database.
However,
if your server is "flooding" or exchanging checksums with other servers,
it is often quicker to stop the DCC server,
delete the
dcc_db and
dcc_db.hash files,
run
- Dbclean -N to create
empty database files,
and the restart dccd with the
libexec/start-dccd script.
When dccd starts, it will notice that the database has been purged
and ask its flooding peers to rewind and retransmit all of their bulk
checksums.
-
Why did building the DCC fail with a complaint about
"Resource temporarily unavailable"?
- The most common cause of this problem is the same the
next question,
or bugs in the target platform's fcntl() locking on NFS file systems.
If the DCC home directory will not be NFS mounted,
it is probably sufficient to run make a second time.
-
Why do my DCC clients including
cdcc and dccproc
complain about "Resource temporarily unavailable"?
- The most common cause of such messages is holding a lock on
the white list file with an editor.
However, perhaps your operating system has bugs in its implementation of
fcntl
file locking, particularly for the
DCC client map file when it is on
an NFS file system.
If so, try configuring, compiling, and installing with the
--with-bad-locks
setting mentioned in the installation instructions.
-
Why does dccifd or dccm complain about
"thread_create() failed: 11, try again"?
- The most common cause of "thread_create() failed: 11, try again"
error messages from dccm
and dccifd
is a too small limit on the maximum number of processes allowed
the UID running the dccm or dccifd process.
The "maxproc" limit should be a dozen or so larger than the sum of
the queue sizes of dccm or dccifd (or both if both are running).
-
Why doesn't my DCC client pick my local DCC server?
- The DCC clients including dccm
and dccproc pick the nearest and fastest
server in the list kept in the /var/lib/dcc/map
file.
DCC servers not in that list will not be used.
That list can be viewed with the
cdcc info
or
cdcc RTT operations.
Add to the list with
cdcc add
or cdcc load.
A nearby server that seems slower than a more distant server will
not be chosen.
Note that the anonymous
user delay set with dccd -u
is intended to make a server appear slow to "freeloaders."
The "RTT +/-" value that can be used with
the cdcc add
and cdcc load
operations can be used to force DCC clients to prefer or avoid servers
except when absolutely necessary.
-
If I have a server-ID, do I need a DCC client-ID, or vice versa?
- DCC server and client-IDs
serve distinct purposes.
Servers require server-IDs to identify each other in the floods of checksums
they exchange and to recognize authorized users of powerful
cdcc operations such as stop.
DCC servers require client-IDs to identify paying clients that should
be given quicker service that anonymous clients, to refuse reports from
anonymous clients, or to refuse even to answer queries from anonymous
clients.
-
Why does my DCC server complain about
"rejected server-IDs" among flooded checksum reports?
- Redundant paths among DCC servers exchanging
or flooding reports of checksums would cause duplicate entries in
each server's database without a mechanism that depends on every DCC server
having a unique server-ID.
Parts of that mechanism detect two servers claiming a single server-ID
and server-IDs that are not listed in the local
/var/lib/dcc/ids file.
Reports supposedly from unknown servers are rejected or ignored by the DCC
server.
The ID of every server in the network must be in the file,
usually without its real password.
The sample ids file in the DCC source
is a good start for
a new DCC server in the network to which dcc.dcc-servers.net belongs.
A current copy of that file is also in the online copies of the source
including that at
Rhyolite
Software.
At least one server in every network of DCC servers should use
an ids file without any extra entries to detect rogue server-ID assignments.
-
Why does my server refuse to accept more than
20 operations per second?
- A common cause of such problems is one of the DCC server's
defenses against denial of service attacks.
A DCC server cannot know anything about anonymous clients,
or clients using client-ID 1 or without a client-ID and matching password
from the /var/lib/dcc/ids file.
As far as your server can know, an anonymous client sending many
operations is run by an unhappy sender of unsolicited bulk mail trying
to flood your server with a denial of service attack.
It is easy to tell your client its ID with the
cdcc add
or load operations.
The DCCD_RL_FREE
and DCCD_RL_ALL_FREE
parameters mentioned in the installation instructions
control the limits.
-
How do I keep strangers from using my DCC server?
- See the dccd -Q
and dccd -u options.
-
How can I determine why dccm reported
a message as spam or with a recipient count of "MANY"?
- Dccm is usually configured to log mail with recipient counts greater
than the -t ,log-thold,
as well as mail with some conflicts among
white list entries.
Each log file contains a single message, its checksums, its disposition,
and other information as described in the
dccm man page.
See also the dblist -C command.
-
How can I see what checksums my server has heard from its clients?
- The dblist -Hv
command displays the contents of the database.
Look for records with your
server-ID
with dblist -I.
-
Why is mail from my favorite mailing list marked with an
X-DCC header line that says it is spam?
- Sources of solicited bulk mail including mailing lists to which
you have subscribed should usually be in your DCC client
white list
so that they receive no X-DCC header lines.
-
Why are some checksums missing from my X-DCC header lines?
- If the DCC client was not able to compute a checksum for a message,
it will not ask the server about that checksum and the checksum will
not appear in the X-DCC header.
For example, if dccproc is not told and
cannot figure out the IP address of the source of the message,
that checksum will be missing.
The Fuz1 and Fuz2 checksums cannot be computed for
messages that are too small, and so will be missing for them.
A checksum will also be missing if the DCC server is configured to not count
it.
-
How do I maintain client
white
lists?
- The overall procedure includes monitoring bulk mail in the
log directories specified with
dccproc -l,
dccm -l,
and
dccm -U,
and adding entries to white list files.
The global
dccm white list file
specified with dccm -w
and the white lists specified with
dccproc -w are easily maintained
with ordinary text editors.
Note that some text editors including versions of vi
lock their files.
Dccm and dccproc are unable to read white list files while they are locked.
White lists specified with dccm -U
are easily maintained with ordinary text editors by the system administrator.
However, it is often better to let individual users deal with their
own white lists.
The DCC source includes sample CGI scripts
to let individual end-users monitor their private logs of bulk mail
and their individual white lists.
See the README file in that directory.
-
Can I use wild cards or regular expressions in DCC
white
lists?
- No, regular expressions cannot be used,
because DCC client and server white lists are converted to lists of checksums.
The same basic idea is used for DCC client white lists
as for the DCC protocol.
A DCC client computes the checksums for a message, and then looks
for those checksums in the local white list.
Depending on the values associated with those checksums,
the DCC client asks a DCC server about them.
There would also be portability difficulties in including regular
expressions in DCC clients.
In other words, consider the complications of bundling
procmail with the DCC code.
To use regular expressions with the DCC, consider procmail.
Procmail is included with many UNIX-like systems.
See also the
Procmail Homepage.
DCC clients can be configured to white- or blacklist
using called "substitute" headers.
See dccproc -S or
dccm -S.
It is also possible to use a sendmail access_db file entries to
white- or blacklist based on portions of SMTP envelope and
client IP addresses.
For example, an access_db file line of "From:example.com OK"
can be used to tell dccm white-list all mail from SMTP clients
in the example.com domain.
See the -O argument to the
misc/hackmc script.
-
How do I white-list mail from a legitimate
bulk mailer using its name or SMTP headers such as Mailing-List or the
Habeas SWE headers?
- Start by determining an envelope value or SMTP header that distinguishes
the bulk mail from a sample message or DCC log file.
The name of the sending computer is the mail_host value in
dccm log files.
If the distinguishing header or envelope value is not among the main
DCC white list values,
then a "substitute" value must be used.
An "ok substitute ..." line must be added to the white list file
and the DCC client program must be told with
dccproc -S or
dccm -S.
There are example white list entries in the sample
/var/lib/dcc/whiteclnt file.
"Habeas SWE" is or is part of a trademark of
Habeas, Inc.,
3045 Park Blvd., Palo Alto, CA 94306
-
Do I need both server and client
white
lists?
- The dccd whitelist file is
not as useful as the client white lists used by
dccproc whiteclnt
and
dccm whiteclnt files.
Entries in a DCC server's white list apply to all clients that use
that server,
including clients in other organizations if permitted.
Thus, only very global values are appropriate for server white lists.
Common entries in server white lists include the 127.1 IP address,
the IP address ranges of the SMTP servers of the organization
running the server, and well known, unimpeachable mailing lists
such as CERT's.
Client white lists apply only to the stream of mail handled by the client.
Dccm white lists apply
to the mail received by the associated sendmail process.
Distinct organizations and individual users can have very different
notions of what bulk mail is solicited and what other mail is always
unsolicited bulk mail.
-
When the white list file
used by dccm
or dccproc
is changed,
what must be done to tell the software the change?
- The DCC clients notice when their whiteclnt files
as well as included files change and automatically rebuild the corresponding
.dccw hash table files.
Changes to the dccd whitelist
are not effective until after dbclean is run.
Note that some text editors including versions of vi
lock their files.
Dccm and dccproc are unable to read white list files while they are locked.
-
Why do legitimate mail messags have
X-DCC header lines that say they are "bulk"?
- There are several possible causes of such problems.
The first and most obvious is that the mail is solicited bulk mail
and that the source needs to be added to your
white list.
Another possible reason is that your individual legitimate mail messages
have not been marked as spam because their Body or Fuz1
checksum counts are small, but that the IP address or other checksum
counts are large.
The IP address checksum count, for example, is the total of all reports
of addressees for that checksum.
That total is independent of the other checksums, and so counts
all reports for all messages with that source IP address.
A source of legitimate mail that has sent a message that was reported
as spam by one of its recipients will often have the totals
for the checksums of its IP address, From header, and
other values be MANY.
This is why it usually does not make sense to reject mail based on what the
DCC server reports for the IP address, From header, and other values that
are not unique to the message.
Only the last Received header line, the Message-ID line, and body checksums
can be expected to be unique and sometimes not the Message-ID
and Received header lines.
-
Why is legitimate mail from someone using qmail
marked as spam?
- A common cause for that and similar complaints involves
null or missing Message-ID header lines.
Spam often lacks Message-ID lines or has a null or "<>" ID,
so rejecting mail with null or missing Message-IDs can be an
effective filter.
DCC clients treat missing Message-ID lines as if they were present but null.
The sample whitecommon
white list file in the DCC source
includes the line:
many message-id <>
Some Mail Transfer Agents violate section 3.6.4 of RFC 2822 and
do not include Message-ID header lines in mail they send,
including some combinations of qmail and
"sendmail -bs" acting as the originating MTA,
and qmail by itself when it is generates a non-delivery message or "bounce."
Solutions to this problem include removing that line from your
white lists
or adding lines specifying the From or envelope
from values of senders of legitimate mail lacking Message-ID header lines.
-
Are IP address blocks
in white lists used by
dccproc?
- Yes, dccproc can white-list mail
by the IP address of the immediately
preceding SMTP client,
but only if it knows that IP address.
Unless the dccproc -a
or dccproc -R
options are used, dccproc does not know the IP address.
-
Why is dccproc is ignoring
env_from white list
entries?
- DCC checksums are of the entire header line or envelope value.
An entry in the white list file for jsmith@example.com
will have no effect on mail with an envelope value of
"J.Smith" jsmith@example.com.
The file must contain "J.Smith" jsmith@example.com.
Another common cause for this problem is implied by the fact that
for an env_from white list entry
to have any effect, dccproc must be able to find the envelope value
in the message in a Return-Path header or -f must be
used.
If your mail delivery agent does not add a Return-Path header
and you do not use
dccproc -f,
then dccproc cannot know about
white or blacklist entries for envelope return addresses.
Note also that dccproc has no white list by default and
that dccproc -w
must be used.
-
Why is the DCC server is ignoring
env_from
white list entries.
- Common causes of this problem include sendmail access_db file entries
and blacklisting entries in the DCC client
white list.
Entries in the sendmail access_db or the
dccproc or
dccm
whitelist override the DCC server's advice.
Note also that it is common for a DCC client to be configured to use
the current nearest of several DCC servers.
If one of the DCC servers does not have the entry in its white list,
the DCC client will occasionally not benefit from it.
-
Can
dccproc -t many be used to report
spam or as a spam trap?
- Yes, see the examples in the
dccproc man page.
-
What if I make a mistake with
dccproc -t many
and report legitimate mail as spam?
- It is possible to delete checksums from the distributed DCC
database with the
cdcc delck
operation.
However, it is not worth the trouble.
Unless the same (as far as the fuzzy checksums are concerned) message
is sent again, no one is likely to notice the mistake before the
report of the message's checksums expire from the DCC servers'
databases for lack of repetition.
-
Can the sendmail "spamfriend" mechanism tell
dccm to not check mail sent to some addresses?
- Sendmail decisions to accept, reject, or discard mail are largely
independent of the decisions made by dccm.
The DCC equivalent is to add
env_to entries to the
dccm white list.
See the sample whiteclnt file in the
DCC source
However, if your sendmail.cf file sets the
dcc_notspam macro while processing the
envelope, then the message will by white-listed.
This is related to the dcc_isspam macro
used by sendmail.cf modified by misc/hackmc -R
to tell dccm to report blacklisted messages as spam to the DCC server.
-
Can spam detected by a DNS blacklist be reported to
a DCC server as spam?
- Yes, for example, see the misc/dccdnsbl.m4
sendmail FEATURE file in the DCC source.
-
Can unauthorized attempts to relay mail be reported
a DCC server as spam?
- Yes, for example, see the misc/hackmc -R
script in the DCC source.
-
How can I avoid polluting databases of DCC servers with
checksums of my mail that is not spam?
- Reports of checksums with
white list
entries in your server's database
are not flooded to its peers.
The checksums of messages white-listed with entries in local
dccm or dccproc
white lists are not reported to DCC servers.
It is good to add entries to DCC server and client
white lists
for localhost, your IP address blocks, and your domains if
you know that none of your users will ever send spam.
However, in the common mode in which the DCC is used, no
checksums of mail are pollution.
Checksums of genuinely private mail will have target counts of
1 or a small number, and so will not be flooded by your server to
other servers.
Strangers will not see your private mail and so will not be able
to ask any DCC server about the checksums of your private mail.
On the other hand, the DCC functions best by collecting reports
of the receipt of bulk mail as soon as possible.
That implies that it is generally desirable
to send reports of all mail to a DCC server.
The DCC flooding protocol does not send checksums with counts
below a DCC server's bulk threshold to
other servers.
-
How many flooding peers does my DCC server need?
- A DCC server in a network of many servers should have at least three
flooding peers to ensure that the failure of a single server or network
link cannot partition the network.
Limiting the number the number of peers of any server to four or perhaps
a few more ensures that no single server is critical to the network.
To minimize the distances in the network, four peers
per server seem necessary.
An organization with more than one server can be viewed as a single
server by other organizations, with its servers flooding each other
and external peers spread among its servers.
This protects the network should the organization suffer large scale problems
while protecting the organization from single points of failure.
-
Do I need to tell the operators of other DCC servers
the password for controlling my server to turn on flooding?
- No, you do not need to and generally should not tell other DCC server
operators the passwords for controlling your server with
the cdcc command.
Every Inter-server flood of checksums is authorized by lines in
each server's /var/lib/dcc/flod file
and authenticated by the password associated with the
passwd-ID in those lines.
The passwd-ID is a server-ID
defined in the /var/lib/dcc/ids file
that should generally be used only to authenticate floods of checksums.
-
How can I figure out why flooding is not working?
- Many DCC server problems can be diagnosed by turning
on one or more of the tracing modes in the server with the
cdcc trace operation
or by restarting the server with
dccd -T.
The cdcc flood list
operation displays the current flooding peers of a DCC server.
Counts of checksum reports sent and received to and from
a single peer can be displayed with
cdcc "flood stats ID"
The positions in the local database of outgoing streams of checksums
are displayed by the start of dblist -Hv.
-
Why didn't the RTT reported by the
cdcc info operation
change when my network topology changed?
- The RTT or round trip time is an average value.
Changes in network topology, server load, and so forth are not
immediately reflected in the RTT to avoid switching DCC servers
too frequently.
-
When my clients are configured to use SOCKS, they do not
realize immediately when a server is down.
- When configured to use SOCKS, DCC clients cannot "connect"
to a server and so do not receive ICMP errors and must wait for
timeouts to know the server is not answering.
This document describes DCC version 1.2.16.