This is a server application which is able to collect messages from various sources, including Twitter. This server contains a search index and a peer-to-peer index sharing interface.
If you like to be anonymous when searching things, want to archive Tweets or messages about specific topics and if you are looking for a tool to create statistics about Tweet topics, then you may consider
Loklak you can:
If you want to create an alternative Twitter search portal, the only way would be to use the official Twitter API to retrieve Tweets. But that interface needs an OAuth account and it makes your search portal completely dependent on Twitter's goodwill. The alternative is, to scrape the Tweets from the Twitter HTML search result pages, but Twitter may still lock you out on your IP address. To circumvent this, you need many clients accessing Twitter to scrape search results. This makes it necessary to create a distributed peer-to-peer network of Twitter scrapers which can all organize, store and index Tweets. This solution was created with
Best of all: we made this very generic to integrate different microblogging services, so this may be the incubator for an independent short message or Twitter-like platform.
Search portals consist of many components, the most prominent parts are content harvesters to acquire searchable content, a search index which provides fast and efficient access to the data and a search front-end containing the user webpages and result display servlets:
Most search portals differ in the way how they display search results but have the almost same back-end to create the search index. We want to support the creation of message/Twitter search portals but the necessary and most generic part needs to be coded only once, even if we want several or even many different search front-ends:
So it's on you to create a message search portal, but the very hard part for this was already done by us. However, the front-end may also instantly be there (i.e. you can just use Kibana).
Collected messages are processed to two storage targets: an elasticsearch search index and a backup- and transfer dump.
To use as most possible characters in every Tweet message up to 140 characters, links are shortened in all Tweets. First users used independent shortener services but now Twitter shortens even already pre-shortened links again. We remove the shortening of almost all links in the Tweet and reveal the original URL the user has attached to their Tweets. This is very important when archiving Tweets because shorteners may not be available in the future and also gives you another privacy advancement because the shortener services cannot track you for their purposes.
Loklakcan even de-shorten recursively, multi-shortened links.
Anonymity is provided with different methods:
loklakserver does not record client IP-adresses when a search is done, therefore
loklakdoes also not record or log IP addresses from searchers.
loklakyourself which gives you complete control over logged things (where no IP addresses are, but whatever).
loklak.organd also don't want to scrape data from Twitter, you can still search in your own index and feed this index with import files from other
loklakpeers. (Look out for the 'Dumps' menu item in the top right corner.)
Loklak instances can be connected to each other. If you download
loklak and run it unchanged, it connects to loklak.org by default as a back-end peer. You can change this if you want to. This is how connected peers work: