jump to navigation

Shiretoko 2 May, 2009

Posted by aronzak in Mozilla Firefox, Uncategorized, web.
add a comment

For some reason, the search box in Mozilla Firefox 3.0.8 decided not to work. So I installed the new version of firefox, codenamed “Shiretoko”, which looks good.  Named after a peninsula on the tip of the North Japanese island Hokkaido, Shiretokowas formerly called version 3.1, and is now version 3.5. The browser uses version 1.9.1 of the rendering engine Gecko, which is faster and supports HTML5. With the improvements in Gecko, Shiretoko gets an impressive score of 93% on the Acid3 test (especially when compared to the paltry 20 that IE 8 gets).

The inclusion of “private browsing”, irreverently dubbed “porn mode” is touted as a significant feature, but  is essentially useless as existing builtin tools and extensions like cookiesafe do a better job at protecting privacy.

The tab bar now has a plus at the end, allowing users to click to bring up a new tab. While this may be helpful to some users, and makes Firefox look more like the interface of other notable browsers, it is similarly useless for advanced users, who are fine with middle clicking links and using control-T. Having the tab bar display with just one tab can be disabled in Edit>Preferences (Tools>Options in Windows) and unchecking “Always show the tab bar” under the “tabs” tab.screenshot-1

Internet content filtering in Linux 27 January, 2009

Posted by aronzak in Linux, Mozilla Firefox, web.
6 comments

For some reason you want to use content filtering software. You’ve probably heard of or used a few tools specifically for Windows.  Here are some solutions for Linux:

1. Dansguardian

Dansguardian works best with the proxy squid. It does some advanced url filtering, using slightly more complicated techniques to block some URLs. sites containing both one and a second listed word are blocked, as well as media such as pictures and video URLs with listed words. This is fairly clever, but has some problems. Sometimes, this has blocked Google for me, as some URLs with random strings of characters (used to track which search results are clicked on) can be blocked.

Content filtering is done with words and phrases having scores, positive or negative. Sites then get a total score based on their content.Interestingly, this can be positive and negative. For example, the word ‘breast’ is bad, but ‘breast cancer’ is a good phrase. In theory, this should limit the amount of overblocking. Dansguardian also filters against anonymous web proxies, which could otherwise be used to bypass filtering.

Sites are blocked if their score is over the ‘naughtiness’ threshold as defined in /etc/dansguardian/dansguardianf1.conf.

Dansguardian is meant to be used in a public sector network, such as a school or library. By default, it blocks many downloads that could contain viruses, or filtering circumvention software. In a home setup, this is just irritating. To stop this, blank out the configuration file that controls blocking files based on extensions; echo “” > /etc/dansguardian/lists/bannedextensionlist

Installation instructions

Dansguardian needs to be used with a web proxy. The installation of Dansguardian itself is fairly easy, with Dansguardian filtering on one port. Getting it to filter the entire connection is more complicated.

1 Install the squid and dansguardian packages.

apt-get install dansguardian squid

2 Edit /etc/dansguardian/dansguardian.conf file and remove the line that says “UNCONFIGURED”.
3 Start the dansguardian daemon.

/etc/init.d/dansguardian start

4 In Firefox,  open up edit -> preferences -> “Advanced” tab -> “Network” Tab > “Settings” button. Set it to manual proxy. Put 127.0.0.1 (localhost)  in the IP address box and 8080 in the port box.
5 Try to connect to goolge.com to verify you can still connect to the internet.
Finally, check that the filter is working by checking that the network traffic is being logged. Open the file /var/log/dansguardian/access.log.

cat /var/log/dansguardian/access.log

There should be an entry there saying google.com. If the file doesn’t exist, something isn’t set up right.

2 Willow Content Filter

Willow adopts the novel concept of using Bayesian analysis to filter the web. Bayesian analysis is currently useb by most spam filters. The concept is that you have a ‘good’ and a’bad’ sample, and the filter can find ‘spamminess’ as a percentage match to the samples. Unfortunately, spammers attempt to make this difficuly by inserting ‘normal’ text into spam. Unlike spammers, most adult website owners are probably broadly supportive of efforts to more effectively filter the internet, as evidenced by the existence of voluntary labelling efforts.

In theory, Bayesian web filtering should work better than the more rigid score based system, with less overblocking or underblocking. Not only does it use words in the title, metas and body of a page, but it analyses the structure of the page itself. The system may also run faster than Dansguardian.

One issue this creates is that samples of content must be provided, including both kinds. Currently, these are not encrypted, or obfuscated. This creates some potential legal and moral hurdles in distributing and using willow.

Installation instructions

1. Download willow

2. Extract it to /var

3. Edit /var/willow/willow.conf and remove ‘exefilter’

4. You might need to.install some more software. Try installing python-profiler and python-central

5. That should be it. Run /var/willow/willow.py –config=/var/willow/willow.conf

6. Set up Firefox as above, to port 8000 (or whatever it is set to in /var/willow/willow.conf)

If that doesn’t work, edit the configuration some more. Unfortunately there doesn’t seem to be much support for willow right now.

3 Fx Extensions

While not as effective, Procon is extremely easy to install. Foxfilter is another filtering extension, but I find it a little slower and more clunky. If firefox is your only browser, this is an easy option.

(4) MintNanny

Linux Mint has introduced a novel way to prevent domains from being accessed by redirecting the request to 0.0.0.0 by modifying the /etc/hosts file. This is a neat approach as it does not require any software to be set up, but, unless you are going to try and subscribe to a domain blacklist, it is relatively ineffective. Most web routers will give you this kind of simple filtering anyway.

If you want to give this a go, you don’t need the MintNanny frontend. Just get in there and edit /etc/hosts yourself. You can even redirect to another IP. Just add in a site you really don’t want to see, as in the example.

209.85.171.100          microsoft.com   microsoft

The IP here is one for google.com. Neat. You could also put in the IP of your own server.

microsoftcom

So there you have it. If you are setting up a non-home network, you’ll probably want to filter transparently. This is complicated to set up and involves editing the configuration of iptables or your proxy. Good luck. Below are some extras that you might want to use if you do use DG or willow

You may want to edit the page that Dansguardian shows when a resource is blocked to give you the full reason. This can be quite long, so I stuck it in a box.

	<font color=red>
	<b>-CATEGORIES-</b>
	<font color=black>
	<br><br>
	<form action="/html/tags/html_form_tag_action.cfm" method="post">Full Reason:<br />
		<textarea style="width:500px;height:100px;background-color:#FF9900;">
		-REASONLOGGED-
		</textarea><br />
	</form>
	<br><br>

The logs Dansguardian gives contain a whole lot of sometimes irrelevant information. Below is a list to process a log file using the tab format, so that it is easier to read. In the future, it would be nice to work on a way of adding this to a database, and sorting into domains.

#!/bin/bash
# Author: Aronzak
# License: GPL
# A script to process Dansguardian log files

# Use tab formatted access.log
# 1	Date Time
# 2
# 3	IP
# 4	URL
# 5	Full Denied report
# 6	GET/POST
# 7	File size
# 8	Score
# 9	Short Denied report
# 10	1
# 11	HTML error code
# 12	Type
# Edit the following variable to add/remove wanted fields in processed log:

DESIREDFIELDS="1,4,8,9"
DEST=/var/log/dansguardian/

cat /var/log/dansguardian/access.log | cut -f $DESIREDFIELDS > $DEST/full
cat /var/log/dansguardian/access.log | grep DENIED | cut -f $DESIREDFIELDS > $DEST/denied
if [ -f $DEST/old ]; then
	diff $DEST/denied $DEST/old > $DEST/diff
	if [ "`cat $DEST/diff | grep '<'`" != "" ]; then
		echo "bad"
	fi
fi
cp $DEST/denied $DEST/old

Willow has a minimal page displayed when a resorce is blocked. I changed this to be more like the Dnasguardian page. The page that is displayed is set three times in urlfilter.py, domainfilter.py and contentfilter.py.

DEFAULTMSG = ('<html><head><title>Content Filtered</title></head>'
              '<body bgcolor=#FFFFFF><center>'
              '<table border=0 cellspacing=0 cellpadding=2 height=540 width=700>'
              '<tr>'
              '	<td colspan=2 bgcolor=#FEA700 height=100 align=center>'
              '	<font face=arial,helvetica size=6>'
              '	<b>Access has been Denied!</b>'
              '	</td>'
              '</tr><tr>'
              '	<td align=center valign=bottom width=150 bgcolor=#B0C4DE><font size=1 >'
              '	<a href="http://www.digitallumber.net/software/willow/" target="_blank">Willow Content Filter</a>'
              '	</td>'
              '	<td width=550 bgcolor=#FFFFFF align=center valign=center><font size=4>'
              '	Access has been denied.<br><br><br><br>'
              '	The content of the resource requested has been determined to be innappropriate<br><br>'
              '	If you have any queries contact your ICT Coordinator or Network Manager.'
              '	<br><br><br><br><br></tr></table></body></html>')

You might want to back up the files before editing them. Have fun.

A look at Mozilla Snowl 17 January, 2009

Posted by aronzak in Mozilla Firefox.
Tags: , , , , ,
2 comments

Mozilla Snowl is a new experimental Firefox addon that acts as a more advanced feed reader. It can display RSS feeds and also Twitter messages. I’ve never gotten into microblogging, but it’s an interesting concept. Here’s how Snowl works as an RSS/ATOM aggregator.

Snowl has three modes; List, Stream and River.

River modes shows a list of messages in a page view. It shows the title and a small snippet of text, and the author.

snapshot50

List mode shows messages like an email client. It allows the a news article to be displayed in full by double clicking the message. There doesn’t seem, however, to be any way to open up links in a tab.

snapshot51

snapshot52

Stream mode displays a sidebar with the latest messages at the top, with nice icons.

snapshot53

Snowl has a lot of nice concepts, but it seems to be more intended to be used with Twitter than web feeds. The small amount of text in river mode, the small size of the frame for viewing articles in list mode and the ability to select people, rather than just feeds, all point to this. It would be good if Snowl was more customisable to better suit needs one way or another.

1. It would be good to be able to select how much text is displayed in river mode.

2. It would be even better to have a feature to expand the text, like in the old Isohunt.

3. It would also be good to be able to customise the number of messages that are kept. I’m not really interested in anything in a feed if I don’t read it after a day.It would be good to limit the time of messages, as well as the number that display in stream mode.

4. It would look nice if icons are displayed in river mode.

5. It would be good to be able to use something like greasemonkey to alter the way in which Google news appears.

6. It would be good if river mode could support the formatting in Google news, rather than just displaying it as text, which repeats the title.

Otherwise, Snowl is looking good.