Archive for the ‘Free’ category

Autoglosser2 released

February 2nd, 2018

During 2009-11 I wrote the Bangor Autoglosser to gloss the Bangor ESRC corpora of multilingual (Welsh, Spanish, English) conversational text. I’ve done a new version, Autoglosser2, that focusses on Welsh written text, and outputs CorCenCC tags as well as Bangor-type glosses. Speed has been greatly increased too, from 1,000 to 22,000 glossses/minute. You can test it online, but for detailed work it’s better to download and install locally. There’s also a detailed manual available. Lots of work to do on it still, but it’s pretty robust, and gives reasonably good results.

Aaron Swartz RIP

January 14th, 2013

This is an extremely sad event. I have written the following to the President of MIT, Rafael Reif:

I am a UK citizen, but I am writing to express my disgust and horror over the
way MIT has behaved in the matter of the death of Aaron Schwartz.

While there is some debate over the methods Schwartz used, there can be none
over his motives – they were based on a desire to ensure that knowledge is
available to all, without artificial barriers of price or being part of a
privileged elite.

I would have thought that an institution like MIT would have recognised and,
if not acquiesced in, at least not opposed this concept. Doesn’t the very
word “university” share the same Latin root as “universal”?

JSTOR, to its credit, decided to adopt a low-key approach to this “crime”,
but, to its eternal shame, MIT did not – it is an institution which appears to
have no conception of the meaning of the words “ethical” or “proportional”.

Non-US citizens like myself can only shake our heads in disbelief over the way
the US patent and copyright circus is poisoning the idea of “doing the right

What a sad day for American letters – that a university of MIT’s pedigree
should let its moral compass go so completely adrift.

Platform and browser

October 21st, 2010

I’ve just finished a project for the Psychology Department at Bangor University, which involved logging various pieces of data on participants as they used the web interface to the survey. One of the most interesting aspects from my point of view was the platform and browser of the participant.

Out of the 834 participants in this sample, 770 (92%) were using Internet Explorer or Firefox, with a 71/29% split between these two. In the “other” category was Chrome (4%), Safari (2%), and Opera (less than 1%).

Browser numbers

Non-Microsoft platforms were noticeable by their paucity – 2% for the Mac, and 1% for Linux. Of the Windows flavours, XP was the most numerous (37%), followed by Vista (33%), and then 7 (26%) – there were even a couple of instances of 2000!

It could be argued that the sample of participants in a consumer survey may be slightly skewed, and not totally random, but these figures are a useful corrective to the figures for “power users” (itself a skewed sample) which we may be more used to seeing reported on IT-oriented sites.

Costing the segmenter

June 8th, 2010

Just for interest, I registered the Swahili verb segmenter at Ohloh. This is quite a clever setup, because they analyse the various bits of code in the repo and come up with a nice set of tables on the analysis page. The number of code lines comes out at around 2,400 (not counting comment lines or blank lines, which I like to strew liberally around my stuff, so that I have at least a chance of remembering what it’s supposed to do!), and on the little widget I’ve added to the site, it concludes that the segmenter would cost around $27,000 (£19,000) to write from scratch.

However, this is of course a little optimistic. Firstly, the Basic COCOMO model they use tends to overestimate the value of small projects. Secondly, about 38% of the code consists of the very nice CSS stuff from Blueprint – not really “mine”. Lastly, the average salary is assumed to be $55,000 per year, which is probably unlikely around here – I think between $35,000 (£24,000) and $45,000 (£31,000) would be more realistic. So if we put the lower figure in, and assume only 62% of the code is my own, and take a bit off for being a small project, that comes to around £9,000 ($13,000), which is probably a closer reflection of the monetary value.

It’s still quite a significant amount – the equivalent, assuming the lower salary figure, of around 4.5 months of work. Since the actual coding time was probably only about half that, it suggests that around 50% of “standard” costs go on things like overheads, meetings, etc. Perhaps that’s another argument for the free software development model – more emphasis on the code than on the organisational framework for it.