En:YaCy-Tor
This is a translation of the German how-to. The German original might be more up to date. |
Note: This How-To is divided in two parts. Please complete part 1 fist before starting with the part 2. |
Thread about Whitelisting feature:
Inhaltsverzeichnis
Information
ToDos
To be added...
Goal
An independent YaCy network to index Tor hidden services is to be build. No normal Internet sites should be indexed for that purpose. There also is a another YaCy network to index both Tor hidden-services and normal Web sites.
Help
Should you have questions or need help, go to the English YaCy forum
Part 1 - Configuring Tor and Privoxy
Please install Tor and Privoxy first. The installation depends on your operating system. Read the OS specific manual.
Configuring Tor
Its sufficient to run Tor as a client, though we are going to install a hidden service later on. The Tor package comes fully configured to run out of box as a client. Nevertheless you should edit your Tor configuration file (e.g. /etc/tor/torrc
) to increase system-security.
First of all look for "SocksPort", which defaults to 9050:
SocksPort 9050
Remember this port number.
If you connect to Tor from the same system only, prevent other IPs from connecting by binding Tor to localhost:
SocksListenAddress 127.0.0.1:9050
Additionally you should restrict access on the Socksport:
SocksPolicy accept 127.0.0.0/8 SocksPolicy reject *
ORPort
, ORListenAddress
, DirPort
or DirListenAddress
only need to be set if you run Tor as a server.
ControlPort
only needs to be set if you run a control application.
Make sure to disable logging, otherwise sensitive informations will be logged. Using
Log notice syslog
only writes minimal information to syslog. (Apparently Log notice
has to be set, otherwise Tor won't start properly. The configuration may vary for different operating systems.)
Who wants to feel really safe can optionally set
ExitPolicy reject *:*
and
BandwidthRate 50 KB
This will limit damage in case of misconfiguration by reducing traffic and restricting connections
Here is the configuration as a whole (depends on your OS, this is for Linux)
ExitPolicy reject *:* User tor Group tor PIDFile /var/run/tor.pid SocksPort 9050 SocksListenAddress 127.0.0.1:9050 SocksPolicy accept 127.0.0.0/8 SocksPolicy reject * Log notice syslog DataDirectory /var/lib/tor/data # ControlPort 9051 # RunAsDaemon 1 # has to be set depending on os BandwidthRate 50 KB
Configuring Privoxy
The following how-to assumes you will use Privoxy for Tor only.
Edit privoxy's configuration file (e.g. /etc/privoxy/config
). Check or edit the following settings.
Don't log every requested page. You only need startup and error messages. Probably the best is to not log anything at all:
debug 0
Make sure only localhost is allowed to connect and privoxy listens to port 8118.
listen-address 127.0.0.1:8118
Privoxy filter should be switched off, since it just acts as proxy between YaCy and tor. You also can switch off toggling:
toggle 0 enable-remote-toggle 0 enable-remote-http-toggle 0
You may disable editing filtes and rules, too.
enable-edit-actions 0
The most important is to forward all connections to the Tor (9050). (Don't forget the dot at the end of line)
forward-socks4a / 127.0.0.1:9050 .
forwarded-connect-retries
should be slightly increased to improve connections. I recommend 2 or 3:
forwarded-connect-retries 2
This is a listing of all settings (depends on OS, here Linux):
confdir /etc/privoxy logdir /var/log/privoxy actionsfile standard actionsfile default actionsfile user filterfile default.filter logfile privoxy.log debug 0 # debug 1 # make sure to uncomment! listen-address 127.0.0.1:8118 toggle 0 enable-remote-toggle 0 enable-remote-http-toggle 0 enable-edit-actions 0 buffer-limit 4096 forward-socks4a / 127.0.0.1:9050 . forwarded-connect-retries 2
Check configuration
Before you start to configure YaCy, you should test the configuration of Tor and privoxy to make sure everything works fine. Wait some time to let Tor connect to the the Tor network. Start your browser and configure it to use a proxy with proxyhost localhost
and proxyport 8118
. Visit an tor-URL, e.g.:
When you are able to connect to an onion URL successfully, continue with part 2 of the how-to. If you are having trouble, check your configuration files and reread the documentation of Tor and privoxy carefully.
Don't forget to remove the proxy settings from your browser-configuration.
Shutdown Tor. Modify the Tor configuration file and add an entry to support YaCy as a hidden-service e.g.:
HiddenServiceDir /var/lib/tor/yacy/ HiddenServicePort 8181 127.0.0.1:8181
Port 8181 is the YaCy port we will use later.
After restarting Tor you will find a file named hostname
in the directory HiddenServiceDir
. The hostname in this file (e.g. 1a2b3c4d5e6f7g89.onion) will be needed later.
Configuring YaCy
Preamble
First of all, there are several ways to modify YaCy's configuration. One is to edit the file yacy.init, another is to edit httpProxy.conf directly. It's up to you which way you choose.
It's recommended to download an up to date version of YaCy and to modify the yaci.init before starting it the first time. This way it is ensured that YaCy didn't make contacts and didn't build an index yet. The informations in yacy.init will be written to DATA/SETTINGS/httpProxy.conf on the first startup.
There are also several ways to modify superseed.txt and here too I will describe an unusual way to prevent that superseed.txt will be overwritten when updating.
The recommended edits are optimal for my configuration. If you use another configuration, make sure you know what you are doing.
Under no circumstances you should try to modify an already used (started) YaCy installation since there are several traps that are not documented and which will cause YaCy to contact public YaCy clusters and distribute onion URLs.
Ok, let's start. First change into the YaCy directory. All following pathnames are relative to the YaCy directory.
Modifying the configuration files
Now we will modify yacy.init. Only the setting we have to modify are listed.
First we have to set the port on which YaCy will be reachable and which is different from the normal YaCy port.
port = 8181
Then we need to set another location of the net definition files since the standard ones will be overwritten with every update.
network.unit.definition = ../yacy.network.unit.tor network.group.definition = ../yacy.network.group.tor
Automatic update should work, but it hasn't been tested sufficiently yet and until we can be sure it won't destroy anything we better disable it:
update.process = manual
It's also important to replace the blacklist with a whitelist so that only the domains will be indexed which are in the list, instead of indexing all domains which are not in the list. This way we make sure that only hidden services will be indexed, since they are defined by the onion domain. Later we will configure the whilelist.
BlackLists.class=de.lulabad.blacklist.advancedWhiteURLPattern
Now we make sure YaCy only will contact the Internet through privoxy:
remoteProxyUse=true remoteProxyHost=localhost remoteProxyPort=8118
Since the DNS-resolution only delivers local network addresses, we have to empty the IP address blocklist for the proxy, otherwise YaCy would try to connect to sites directly without using the proxy and thus won't be able to find them:
remoteProxyNoProxy=
The following settings make the seedfile available in the Tor network:
seedUploadMethod=File seedFilePath=htroot/seed.txt seedURL=http://1a2b3c4d5e6f7g89.onion:8181/seed.txt
Now we give our YaCy a freely selectable name:
peerName=TorYaCy
YaCy needs to run in debug mode to handle local addresses (as used by Tor) correctly:
yacyDebugMode=true
To be able to make a connection, YaCy needs to be told from which hostname (domain) it is reachable:
staticIP=1a2b3c4d5e6f7g89.onion
Should you want YaCy to open a browser window, just skip the following option. Otherwise set:
browserPopUpTrigger=false
Since the Tor network is not the fastest, we set all timeouts to high values:
clientTimeout=90000 crawler.clientTimeout=90000 proxy.clientTimeout=90000 indexControl.timeout = 180000 indexDistribution.timeout = 180000 indexTransfer.timeout = 360000
The following options are very important for that our peer won't contact any public clusters but only other Tor-YaCy peers:
CRDistOn = false CRDist1Target =
For security reasons it is also important that the proxy isn't reachable from the Tor network. The following configuration describes the scenario that YaCy is running on the same computer as Tor. Then you need to set for example 192.168.1.2 as the address for the server in your browser instead of localhost:
proxyClient=192.168.*,10.*
At last we set several options to increase the anonymity in the Tor network:
proxy.sendViaHeader=false proxy.sendXForwardedForHeader=false useYacyReferer=false useYacyReferer__pro = false
Optionally we can set the following options to restrict the maximum file size (here ~10MB) and to reduce the cache size on a minumum (here 4MB), because the sites we browse are cached there:
crawler.ftp.maxFileSize=10000000 crawler.http.maxFileSize=10000000 proxyCacheSize=4
Activate Whitelist
Thread about Whitelisting feature:
YaCy only supports a blacklist by default, therefore you have to download [1] (or higher) and copy it to libx. After that the previously configured filter is available.
Sorry, but this Whitelist can't be used at this moment:
Now we just have to make an entry to only index .onion sites:
- Create the subfolder DATA
- Create the sub-subfolder LISTS (DATA/LISTS)
- Create the file url.default.black (DATA/LISTS/url.default.black) with the following content:
*.onion/.*
A possible workaround is to use a filtered proxy in front of YaCy they accept only *.onion domains.
Defining the YaCy-Tor-network
By now, YaCy is able to build and define separated networks: Netzdefinition
The current definitions can be downloaded from [2] and [3].
yacy.network.group.tor
is empty and yacy.network.unit.tor
has the following content:
network.unit.name = torworld network.unit.description = Yacy network for TOR https://www.torproject.org/ network.unit.domain = any network.unit.search.time = 4 network.unit.dhtredundancy.junior = 1 network.unit.dhtredundancy.senior = 3 network.unit.bootstrap.seedlist0 = http://byi4akelnxrz5tab.onion:8081/seed.txt network.unit.bootstrap.seedlist1 = http://pah22f4rpnz4hoyn.onion:8084/seed.txt network.unit.bootstrap.seedlist2 = http://zxbagwypsfbicebv.onion:8091/seed.txt network.unit.update.location0 = http://yacy.net/Download.html network.unit.update.location1 = http://latest.yacy.de network.unit.update.location2 = http://www.findenstattsuchen.info/YaCy/latest/index.php network.unit.protocol.control = uncontrolled
Starting YaCy
Now you may start YaCy. Watch the log file and maybe the network graph, since other Tor-YaCy should be seen within minutes. Public IPs shouldn't rise in the log file. Error messages caused by the seedfiles may appear in the beginning and can be ignored as soon as the first other Tor-YaCy are found.
Visit http://localhost:8181 and set an admin password when you start yacy for the first time. |
Using YaCy
Enter proxyhost localhost
and proxyport 8181
into your browserconfiguration. Now you should be able to visit Tor hidden services using YaCy.
Notes
Tor is a slow and sometimes unstable system and sometimes it can take a while until the YaCy peers find eachother and exchange data. Be patient.
- Some Tor pages have to be reloaded several times
- DHT transfer to other tor-YaCy is working (untested)
- RankingDistribution to other tor-YaCy is working (untested)
- The status of a peer depends on the quality of the connection. Don't be suprised if you are principal and some minutes later you are junior.
- Don't index the Internet sites using Tor-YaCy. That would destroy the Tor-only index. You may find filter rules which block the access to the Internet at other tor-peers. Use them!
- The German version of this article can be found at the tor-hiddenwiki. Who modifies the article, should modify it there, too: http://6sxoyfb3h2nvok2d.onion/tor/TorYaCyHowto
Security Hints
- logging should be disabled in all programs or the log-files should be placed on a ramdisk, it is the same for YaCy.
- YaCy's HTCACHE should be placed on a ramdisk
- when starting crawls, these should be run local and not distributed, don't forget to set the filter rule to onion-domains
- wiki and blog should be used carefully
- public bookmarks shouldn't be used
- browser cache and browser history should be deactivated
- paranoid people can install YaCy on an encrypted filesystem
Seeds for Tor-YaCy
Please only post well available seed files. These seed files can be added to the unit file.
- http://byi4akelnxrz5tab.onion:8081/seed.txt
- http://pah22f4rpnz4hoyn.onion:8084/seed.txt
- http://zxbagwypsfbicebv.onion:8091/seed.txt
If your Tor-yacy is up and running, please post the URL to your seedfile here (the one from seedURL
)
Demo-peers
Beware, maybe they are down.
- http://pah22f4rpnz4hoyn.onion:8084/
- http://2vkpt4pelhbu7yfw.onion:8181/
- http://k3ttdasxnbvtbfwm.onion:8181/
- http://iwhfgshjfhygilti.onion:8181/
Similar YaCy networks
Please always join already existing networks whenever possible. Tor's resources are limited and should be spared. In addition it is hard enough to connect enough servers to build a stable Tor-YaCy-network.
freeworld
That is not a dedicated network, but usal YaCy server, there are provided as hidden service in tor network, too.
So they are not crawl tor hidden services, but enable tor users to use directly and anonymous yacy server.
Servers:
Please announce your server in tor network, too:
anonymworld
Goals
anonymworld is a network that, in addition to Tor hidden services, also indexes the normal web and only is reachable through Tor.
Differences
The following changes from the above described procedure have to be made:
- No whitelist: All whitelist/blacklist entries as well as the additional Jar-file are not necessary.
- Independent network: There are special unit and group files.
Important: This are the differences from the general description!
Unit/Group-Files
The current definitions can be downloaded from [4] and [5].
yacy.network.group.tor is empty and yacy.network.unit.tor has the following content:
network.unit.name = anonymworld network.unit.description = Yacy network for TOR https://www.torproject.org/ indexing whole world network.unit.domain = any network.unit.search.time = 4 network.unit.dhtredundancy.junior = 1 network.unit.dhtredundancy.senior = 3 network.unit.bootstrap.seedlist0 = http://jarwf7lglg3lbujb.onion:8086/seed.txt
Demo Peers
Beware, peers might be down.
Seed files
Please only add well available seed files.
External links
- https://www.awxcnx.de/tor-i2p-proxy-en.htm - Proxy page for using and testing tor network (hidden services) without any own tor installation
- http://tor.eff.org/download.html.de -- downloading Tor and installation hints for several operating-systems
- http://rabe.supersized.org/archives/683-Mini-DNS-Server-fuer-Tor.html -- hints on the dns-problem and solving it (german)
- http://www.suma-lab.de:8080/Wiki.html?page=yacy-tor -- another description for installing yacy-tor (german)
- core.onion - Build decentral search engine network to index hidden services
Documentation for newbies in the Tor world
- http://6sxoyfb3h2nvok2d.onion/tor/ -- Hidden-Wiki with a list of Hidden Services
- http://6sxoyfb3h2nvok2d.onion/tor/TorYaCyHowto -- How-To at Hidden Wiki for using Tor and YaCy
Von dieser Seite existiert auch eine deutsche Version.