VM Ware Image

  • linux vmware-player image (security++, simplicity++)


To improve the search results (witch are not in a pretty order most time) it might be useful to introduce clustering, as you can see on -- MovGP0 16:29, 4. Mai 2006 (CEST)

Link Farm Crawling

To fight Linkfarms full of Spam it might be senseful to crawl such pages with a link-deep of about 1-2 and collect all of the liked domains and grewlist them. The user should get a Webpage with links to that pages afterwards, so the user can control the result if the user is not sure if a specific link is really spam. Afterwards, the user can decide to remove some domains from this list. The rest will get added to the blacklist. If a domain occurs about 1 + floor(sqrt(NumberOfPeers)) times in a blacklist, the site might get blocked within the whole YaCy-Network -- MovGP0 16:29, 4. Mai 2006 (CEST)

    • Blocked in the whole net is not possible. We have no control, what a peerowner does. But we can send a News, which could be a hint for other peerowners from our peer. But if its more than one Pagemoderation per day, its to much to do for other peerowners ...

Feedback to rate search-result quality

  • At the end of a page with search results, I would be happy to give "you" a feedback. So that I can say, if YaCy was finding my page or my information and perhaps where I finally found my information or which page is not yet part of our index. I think this could be a good way to improve the quality of YaCy... --GoogleFan 14:51, 2. Jun 2006 (CEST)
    • There is no "You", YaCy is decentral. Your Peer can give feedback to other peers of course.--Allo 15:31, 4. Jun 2006 (CEST)
      • What about Seeks? --Ktplulo 19:06, 10. Mär. 2012 (CET)

More from this page

  • Just show a few results per domain and a link/button "more from this site" so if I try to find information about a company/site (e.g. microsoft) the results aren't flooded with results from their site. Helpfull if I do some research and don't want to get all the marketing crap.--Neo@NHNG 14:58, 15. Feb 2008 (CET)

External Blacklists

RDF support

  • RDF-Storage based on the Jena Framework.
    If the crawler finds an RDF-File (whitch means .rdf, .owl, and .foaf Files) or RDF-Markup within a xHTML-File, the Content of this RDF should get copied into a distributed Jena-based Semantic Storage (afaik Jena is not mind to support distributed computing/querying, so you might need to develop you're own storage). Also it should be possible to make global SPARQL-Queries on this Storage. There is also the need for a timeout, so that Semantic Queries won't take to many resources.
    This is also interesting when wanting to offer RSS 1.0 and RSS 1.1 support.
    Notice, that I think that this whish is a realistic goal for version 3.x. RSS 0.9, RSS 0.91, and RSS 0.92 should not get supported, because there are not compatible with RDF.
    -- MovGP0 15:39, 4. Mai 2006 (CEST)