Dev:RoadmapArchive

Aus YaCyWiki
Wechseln zu: Navigation, Suche

implemented and ready for next Release

Bug in kelondroRecords$Node.getValues()

  • Bugfix for Index Restore Bug [1]

robots.txt

  • implementation of a robots.txt parser
  • caching of loaded robots.txt files according proxy cache rules
  • control of remote crawls with robots.txt


Better Caching

  • new cache-control menu
  • better default values for caches
  • better Node-Cache

DDoS-Prevention

Doing extensive crawling combined with remote crawling can lead to DDoS-similar behavior, which must be prevented

  • load balancing between different crawl target domains
  • if load cannot be balanced, ensure forced pauses between loads
  • balance also for remote crawls

Borg-0300

  • Bugfix for URLs with ':80'
  • added smaller indexmonitor.gif, save 9 kb
  • proxyCache, proxyCacheSize can be changed under 'Proxy Indexing'
    • path now are absolute
  • change cache cleanup in plasmaHTCache
    • added new HashSet filesInUse;
    • added new Function getFreeSize();
    • added change from Hermes
  • accelerated Blacklist_p.java, big lists 25x+ faster;
  • sorted config list (Config_p.html);
  • CacheAdmin_p.java
    • sorted directory/file list
    • dont list responseHeader.db
  • improvement, constants, finals, StringBuffer, cleaned, Properties
 serverLog.java, yacy.java, ProyxIndexingMonitor_p.java, plasmaCrawlNURL.java,
 plasmaURLPattern.java, Blacklist_p.java, Config_p.html, Network.java,
 CacheAdmin_p.java, hello.java, dir.java, welcome.java, kelendroMap.java,
 plasmaWordIndex.java, Status.java, yacyClient.java, yacySeed.java, yacySearch.java
  • small other ?

Ich hoffe das das so richtig ist. Wenn nicht bitte ändern oder diesen Satz löschen.

Low012

  • implementation of missing tags in yacyWiki


Theli

  • Bugfixes
    • Extending autoconf to support servers with multiple network cards correctly
    • Correcting ProxyerrorMsg BASE URL
    • Avoiding erroneous reloading of robots.txt
    • transferURL url list idx was not incremented properly
    • Bugfix for wrong entry count on loader queue page
    • Bufrix for wrong entry count on indexer queue page
    • Bugfix for Seed-Upload failure via ftp
    • Bugfix for NullPointerException during hello.respond
    • Bugfix for NullPointerException during plasmaSwitchboard.processResourceStack
    • Bugfix for NullpointerException in serverAbstractSwitch.setConfig
    • Bugfix for urlEntry.url() == null problem
    • Displaying an proxy error page instead of a white page if the server has closed the connection before yacy was able to receive the http response line
    • Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
    • Changing Proxy-Useragent string according to thread http://www.yacy-forum.de/viewtopic.php?p=8183#8183
    • Bugfix for ProxyAccess Logger. URL was accidentally logged without the parameters
    • Bugfix for Crawler Redirection Bug
    • Bugfix for "Crawler data will not be removed from htcache if content parsing failed"
    • ConsoleOutErrHandler can now be configured to ignore ctrl chars
    • Bugfix for "Binäre Nullen auf der page: Index Creation: Indexing Queue"
    • Transfer-Encoding header was accidentally stored into cache by the proxy
  • Maintenance
    • Upgrading PDF Parser to PDFBox V. 0.7.2
    • Upgrading Jsch Lib to Jsch V. 0.1.21
    • Restructuring serverCore/httpd/httpHeader classes to improve code reusability
    • IfesL: Suppress many stacktrace
    • adding isLoggable function to serverLog
    • harmonizing proxy exception handling
    • indexer now gets the mimeType now from the parsed document instead of the responseHeader (this is especially necessary if mimeType has to be detected by the MimeType parser)
    • Logging error message to logging output if no errormessage can be send to the user by the proxy
    • Moving Logging directory to DATA/LOG
    • Renaming Logger function names to reflect the proper Java Logging API Loglevels
    • Better synchronization of plasmaHTCache methods
    • Pausing Crawlers if there is not enough space on disk
    • Trying to solve "Too many open files bug" by adding many stream/reader close calls
    • Adding detection of broken connections to Port Forwarding feature
    • Making proxyAccessLogging configureable via yacy.logging
    • Printout date and system name on proxy error page
    • Normalizing CrawlerStartURL now before crawling is started or before the crawler is following an redirection URL
    • Using configured timeout also to establish a connection by the httpc


  • New functionality
    • Robots.txt support
    • Content-Encoding GZIP support for http post requests on index transfer/distribution
    • ICAP support
      • Implementation of an embedded ICAP server (experimental state)
      • Yacy allows other proxies to use the indexing service via icap response modification requests
    • Index Transfer (allows to transfer whole peer index to remote peer)
    • Extending httpFileHandler to allow forcing of URL redirection after http post requests
    • Adding templateCache to httpFileHandler
    • Adding blacklist support for https requests
    • Adding blacklist support for the crawler
    • URLs pointing to a server having a private ip addess will not be indexed anymore
    • Adding functionality to delete entries from Indexing and Crawler Queue
    • Adding functionality to clear the whole indexing queue
    • IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
    • Unsupported MimeTypes and fileExtensions will not be queued by the cachemanager in the indexer queue anymore (to reduce unneccesary IO)
    • proxy now also checks malformed URL + blacklist check for http head requests
    • proxy now allso supports malformed URL check for http post request
    • proxy now supports the X-Forwarded-For Header
    • Indexing queue now displays total size of enqueued content in kb
    • Remembering Crawler-isPaused setting by storing status into config file
    • Making ACCEPT_LANGUAGE configureable for crawler
    • Splitting of status page into a private and a public accessible part
    • Adding Queue overview to status page

allo

  • userDB Interface
  • ProxyUsers from userDB instead of a single user
  • proxyAuth by userDB
  • proxyAuth by IP(10 Minutes cache)
  • userDB for Adminuser
  • userDB for Fileshare
  • Templates: Function to translate included Files.
  • User.html
    • Logout Function for UserDB Auth


Orbiter

  • no caching during iterations
  • Database Performance: read-only mode (allowes more performant caching), read-at-once/full-load for RWI access (usable at search), Usage of buffered RA file: this is all obsolete if the database is replaced by new indexCollections
  • "1-2-3" - Settings for first startup in online-menu instead of status page
  • a virtual file system implementation that shall prevent that the OS runs out of file handles and is able to store many kelondroRecord structures in one file using kelondroRA: obsolete
  • implementation of "real" database tables using all the different data structures that kelondro offers:
    • tables shall be extensible (add colums)
    • it shall be possible to add indexes to table columns
    • all files that belong to a table shall be stored n a kelondroVFS so that there is always only one file for a table
    • one table shall consist of many tablespaces so that very large tables are possible for file systems where only 2GB are available for one file

this is realized partly with the kelondroFlexTable and with the kelondroRow configuration method