LaTeX and tonemics

July 29th, 2007 by donnek No comments »

A number of the people I’ve been working with on software for Welsh have academic papers available for download, and when you look at them you can see that they are using the LaTeX development of Donald Knuth’s TeX typesetting system.

So I decided it was time that I got at least a nodding acquaintance with these systems. The key feature of TeX is a programmable markup system, and it does seem as if once you know the details, you can do virtually anything with it. I won’t be in that category for some years, so I’m using the excellent Kile to ease my entry. There are other frontends available (eg Texmaker, Lyx).

The best way of checking something out is to try doing something real-world with it, so I decided to go back to some papers I’d written a looooong time ago and see how these might come out in LaTeX. These were on African linguistics, and since most African languages are tone-languages, having access to diacritics that will allow you to represent these is essential.

A bit of reading around led me to TIPA, a package developed by Rei Fukui at the University of Tokyo, which is aimed at allowing all the symbols of the International Phonetic Alphabet to be represented in LaTeX. It’s a wonderful piece of work, and comes with a very extensive manual. The best thing about it is that it provides symbols for representing up to 5 levels of tone.

By default, these are set up to use a right-hand stem bar for the tone – this seems to be the default in most work on Asian tone-languages. Here’s an example:
However, most work on African tone-languages tends not to use this system, probably because glides are a lot less frequent, and because the relative rather than absolute pitch-level seems to be more important in systematising tonal phenomena there. So ideally I needed some way to suppress the display of the tone stem bar.

Professor Fukui was kind enough to give me the magic formula to do this, and I reproduce it here in case it might be of benefit to someone else. All you need to do, after invoking the TIPA tone module as usual in the document preamble, is to add a couple of extra lines, so that the preamble now looks like this:

\documentclass[a4paper,10pt]{article}
\usepackage[tone]{tipa}
\usepackage{tipx}
\makeatletter
\renewcommand\@tonestembar{\setbox0\hbox{\tipaencoding \char’277}\hbox{\vrule height \ht0 depth \dp0 width 0pt}} % no stem
\makeatother

With this in place, the tones now come out like this:

Perfect. That means that you can then (for example) use:
mpf\'umu [ \tone{44} \tone{22} ]
to produce:
(the Koongo word for “chief” in Hazel Carter’s orthography) and give the standard indication of the pitch-contour.

Iriver T60 has a 1,000 file limit?

July 14th, 2007 by donnek 4 comments »

With my second boy going on a long trip, he wanted a media-player that played off a standard battery rather than one of the rechargeable ones that you need to find an electricity source for. It also had to play ogg files, since that’s what our CD collection here is ripped into. Cowon, which is very supportive of Linux, used to do an excellent little player that ran on a AA battery (the G3), but the one I had had only 256Mb capacity, and the line seems to be discontinued now. So we finally settled on the Iriver T60, and even though it has 4Gb of storage, the player is not much bigger than the AAA battery that powers it. Very nice indeed, although the menu system isn’t as good as the G3’s.

I started loading on his favourite CDs from our music server, and ripped a Spanish course and put that on too. I just connected the player to the PC, and dragged and dropped. Our format is Artist, then Album, then Track. I put on about 10 CDs that evening, and tested them – all played fine.

Next morning, continuing the job, any new CDs I dragged and dropped wouldn’t play … Hmm – the others still did. Cue much head-scratching. I eventually moved the tracks up a level, so there were now just two levels (Album, Track) instead of three. That worked, and I thought the problem was solved, but no – add a few more and they wouldn’t play either.

Hmm. I’m using USB Mass Storage (UMS or Mass Storage Class, MSC). But this thing also uses Microsoft’s wacky MTP (Media Transfer Protocol). If it comes with that as default (the default can be changed via a firmware update), maybe it wants the joy of attachment to a legacy OS? So I fired up my wheezy old XP install, and added a few tracks using the Iriver software (not very good, btw). They seemed to play all right, so I dragged and dropped from XP (ie using UMS). They played too.

OK, back to Linux. I added a few more tracks there, and they seemed to be OK too. Added a lot more, and oops – some of these don’t play. There was steam coming out of my ears at this point, after spending most of the day on this wretched little contraption. So I sat and went through every single one of the 76 CDs I’d transferred there, and it was only the last 8 that wouldn’t play. Looking at the system menu, I noticed that the number of tracks listed was a suspiciously round 1,000.

Does this mean that the T60 can’t play more than 1,000 tracks, which seems a bit on the cramped side for a 4Gb player? It might be just about OK for mp3s, which tend to be around 5Mb apiece, but with oggs, which tend to be smaller (say 4Mb), it probably wouldn’t be.

Unfortunately (or fortunately for my sanity), the boy had to depart, and I have had no opportunity to test this further. But when he comes back, I’ll do a complete firmware upgrade and start again.

In the meantime, T60 buyers using OSS might like to be aware that one or all of these may apply:

  1. the T60 doesn’t like being connected UMS style to a Linux box;
  2. the T60 doesn’t like more than two levels of directory;
  3. the T60 doesn’t like to hold more than 1,000 files.

openSUSE howto on Subversion

June 18th, 2007 by donnek No comments »

I’ve added a page to the openSUSE wiki about Subversion. This grew out of trying to get my various projects organised a little more logically after getting my new PC.

openSUSE howto on Arduino

June 11th, 2007 by donnek No comments »

I’ve added a page to the openSUSE wiki about Arduino, a little programmable board which can be used as the basis for experimenting with computer-controlled stuff (eg the sort of custom musical instruments that Recursive Dog were using at LAC2007). I haven’t had much time to do more than get the board running, but the future beckons ….

Linux Audio Conference 2007

March 30th, 2007 by donnek No comments »

I haven’t posted for a while, so I’m going to cheat and backdate this one. The reason was LAC 2007. I’ve been saying apologetically for a couple of years now that multimedia in Linux is about 18 months behind other OSs in having something that is really useable on the desktop. After LAC 2007, I realised that I had to stop saying this, because multimedia in Linux is available here and now, and offers the same freedom, openness and low cost that we are already familiar with in Linux.

So the posting shortage has been due to my exploration of just a few of the amazing tools now available in Linux for composing music (and also video). LAC 2007 was a real eye-opener, and I personally am very grateful to the organisers and presenters for the wealth of information they presented. One of the most attractive aspects was the time set aside for hands-on tutorials and workshops – there’s nothing like actually using an app to get a feel for it.

I also need to say a special thanks to Stefan Kersten for helping me get SuperCollider running on my laptop. I had tried in vain several times after seeing Simon Blackmore demoing it at a Bloc seminar, but Stefan sorted it out in a couple of minutes!

It was the first time I had been to Berlin, and it was wonderful to wander around the city, now being substantially reshaped and rebuilt after German reunification. It was strange to consider that as little as 20 years ago I might have been arrested as a spy for walking down some of the streets I visited. The Technische Universitaet was a wonderful setting for the event, and the Lichthof in particular, a sort of atrium around which the various activities were clustered, has to be seen to be fully appreciated. There were also a number of concerts and sound installations to enjoy. The published proceedings of the conference (a snip at 8€) give a great overview of all the stuff that is going on behind the scenes on Linux audio – worth reading.

Highlights of the conference for me were:

  1. Hartmut Noack on his Linux Audio Workstation, one of which I am using to type this, with some very pleasant ambient music running in the background (I have to say it is “very pleasant”, because it is self-composed!)
  2. Rui Nuno Capela on his lightweight Qtractor sequencer.
  3. Michael Bohle and friends on JAD (Jacklab Audio Distribution), their audio version of openSUSE, adding another dimension to the best Linux distro out there.
  4. Steven Yi on blue, his attractive frontend for CSound. Steven’s homepage contains some delightful examples of compositions using blue/CSound.
  5. The Recursive Dog crew for their hands-on demo of the Arduino board, and the musical instruments they have built.
  6. Sergio Luque for his walkthrough of the composition process with SuperCollider, explaining how one of his own works was put together.
  7. The Canorus team demoing their music score editor – some great ideas, although unfortunately the software itself isn’t really useable yet.
  8. Richard Spindler on his Open Movie Editor, which appears to be a great leap forward for the average desktop user.

Perhaps the most important aspect of Linux audio is the fact that at a stroke it lowers the threshold for participation in creative audio work. Open any magazine like Sound on Sound or Computer Music and you’ll see lots of reviews of various pieces of software which are marketed as giving you the edge in audio production. Some of them certainly look very striking, with non-standard GUIs the order of the day. The prices may not be all that bad either, until you start thinking about needing to use two or five or eight of these apps. Then the cost starts to mount. Plus, of course, you are dependent on the whim and fortunes of individual companies, both for your OS and for your apps.

For many of the young people who want to get involved in music-making, of whatever type, this sort of expenditure is a gamble. Do you buy something cheap, and find you’ve wasted your £40 because it is very limited in what you can do with it, or do you pay a lot more and find you can’t use it properly? There’s little you can do about hardware costs (although the relative cost of keyboards, mics and guitars is now pretty low, and secondhand prices are even better), but software costs can certainly be slashed if you use Linux. Even if you buy a supported audio distro like Studio To Go or 64 Studio, the range of software you get still makes this a bargain compared to the proprietary solutions being offered.

So Linux is a tremendous opportunity for young people who don’t have much spare cash, and don’t mind experimenting with the sort of stuff which you won’t see covered in the newsstand mags, but which now gives shrinkwrap software a good run for its money.

I’ll be returning to this in future posts ….

Qt Linguist – nul points!

March 16th, 2007 by donnek 4 comments »

I’ve just finished a translation of QCad, the 2D CAD program (see this page for a couple of screenshots). It’s a Qt program, so it uses ts-files instead of the gettext po-files. Qt supplies their own app (Qt Linguist) to deal with these.

The main “benefits” of ts-files are that they group messages into sections, which is supposed to make it easier to track down where your message should be showing up, and they use XML. But in my view, they are the most abominable things ever to be let loose on localisation.

Because the messages are split into sections, this means that if you use the same message in multiple areas of your app, you will get it appearing in multiple sections of the ts-file, and … yes, that’s right … it has to be translated multiple times. Qt Linguist has a section where previous translations (in that file!) have occurred, and you can press Ctrl + a number to copy them into the message, and then Ctrl + Enter to mark the message as “complete”. So for each repeated message you need two extra keystrokes before you can move on to the next using Ctrl + Shift + L. On a shortish file, this doesn’t matter, but when you have one that is longer than about 800 messages, it starts to grate. Oh, and you’ve just discovered you misspelt a word in one of the messages you were happily copying in to these duplicate messages … Now you have to find each one separately using Ctrl + F, and correct them. (I know you could do it via some sort of grep, if you are handy with grepping, but I wouldn’t trust it not to make changes where it shouldn’t, leading to even more edits.)

And XML … Well, once you get over about 1,000 items (and apps like Scribus have considerably more than that), you can just feel Linguist chomping it’s way through all that excess text – it can take 2-3 seconds just to go to the next item in a Find list. My biggest gripe about XML (apart from the one about not using it to hold data when there are far more efficient apps to do that – they’re called relational databases) is that it’s so wordy. For configuration files, and portable formats, it can be OK, but there are some types of data where the XML tags end up being probably 70% of the filesize (the tei-files for dictd – see this page) are an example.

So this is another case where the free software architects got it right with gettext – old-fashioned maybe (no XML), but very practical.

Meddaliadur

March 15th, 2007 by donnek No comments »

When I put up the first version of Eurfa, I had the idea of doing a directory of programs and apps that were available in Welsh. I’ve now published the initial version of Meddaliadur, which is a start on this task. It lists a handful of programs, with a short description, a link to the website, license and cost details, information about who did the translation and where it can be got, and (last but not least) a few screenshots.

The idea is to show that there are quite a few pieces of software in Welsh, and the number is growing. This may attract some people to the apps themselves, and it might encourage others to think about making their apps available in Welsh too.

What I’ll be doing is splitting out the various programs that form part of the KDE and GNOME desktops, and then adding any other programs that I know have been translated. So that means, for instance, that there will be a Games section, with separate pages for KSokoban, Kolf, and so on. This will give a much better idea of the range of software (especially free software) already in Welsh.

Software in Welsh on other platforms (eg Microsoft Windows, Apple Mac OSX, Solaris, etc) will also be included, since the aim is to give a reasonable overview of all that is available.

The pages at present are simple HTML, but that would become unmanageable as the number of programs grows, so I need to move it to a database-backed system. I’ll take that opportunity to add things like the ability to leave comments about particular programs, and perhaps a space for beginner’s tutorials or howtos on the programs.

The old "turn it on" trick …

March 9th, 2007 by donnek No comments »

I was installing various pieces of music software on my laptop last night – things like Pure Data, Csound, Supercollider – in preparation for the Linux Audio Conference at the end of this month (at least I would know something about what people will be speaking about!). I went to play an mp3 file of the output from a pd patch, and Amarok complains that it can’t play it.

Ah, I haven’t set the soundcard up on this new install of openSUSE 10.2. OK, I do that, but Amarok says it still can’t play the mp3 file.

Of course, this is crappy proprietary mp3, so I need to uninstall the default xinelib and install libxine. OK, Amarok now plays the thing – I can see the visualiser moving – but no sound.

Hmm. Queue an hour or two of trying different drivers for the soundcard, googling, fiddling about with alsamixer, etc. Still no luck.

This morning, I turned on the laptop and tried again. No sound. All the software seems to be working … surely … it couldn’t be … no … what about pressing the volume up button on the keyboard? Aargh! Sound! Who muted the damned thing??!!

Moral – never understimate your own stupidity!

Don’t forget your READMEs!

March 7th, 2007 by donnek No comments »

Klebran is now listed on the OpenOffice.org site as one of the free grammar checkers. This thread on the relevant discussion list was quite interesting, because it shows that you need to check your README files every so often, and update them in line with the way the project develops. In this case, my failure to update the README led Dewi to an incorrect conclusion. I checked back over my email archive (going back to 2003) to see what actually happened.

A tagged wordlist is a prerequisite for any grammar-checker, and when I began considering a GPL Welsh one in early 2004, I decided to add meanings as well, since there was then no GPL dictionary available for Welsh. A number of lists – Jim Killock’s 2003 aspell list, the Crubadán web-crawler Welsh corpus, the UWB Cronfa Electroneg (which has fallen off the web, but is still available here, although the downloads don’t seem to work), and the UWB myspell list – fed into this dictionary project, in the sense that they provided “control” lists of frequently occurring words. I had to make a start somewhere, so in the event I took a list of the 5,000 most common words in Crubadán that also occurred in the myspell list, and began checking these and adding tags and meanings.

The 1.2 revision of the README that Dewi referred to dates from that point (March 2005). However, almost immediately (April 2005, according to my emails), I decided not to pursue this approach. Instead, once the “5,000 most common words” were done, I started inputting words from multiple everyday sources (books, magazines, already-completed translations) as I met them.

Why did I do this? Reading the emails, one reason for the change of plan seems to have been uncertainty about the myspell license – in the UWB README, the license is specified as GPL, but condition 3 appeared to me to be incompatible with that (I’m not from the FSF, so I can’t be sure, but it does look odd). But another reason was dissatisfaction with the verb-expansion rules giving inflected forms (I won’t go into details here, but I can give chapter and verse if anyone is that interested). I therefore decided to ignore the myspell-generated verb forms, and began a “clean-room” implementation of my own, where the abstraction rules behind the generated forms would be open to scrutiny. That led to Konjugator in July 2005, which was a necessary detour, and by the beginning of 2006, as Kevin Scannell said in his response to Dewi’s post, work on the dictionary was progressing well – the first version of Eurfa was released in April 2006. The dictionary in turn, as he said, is now the basis for Klebran, giving a nice example of how tools like this can build on each other!

It is therefore wholly untrue to say that my dictionary “incorporated” the myspell list, and the README should really have been updated in April or May 2005 to reflect that (revision 1.2 was left untouched for more than 23 months, in contrast to other parts of the repository!). It’s trivial to show this – download lexicon-cy.txt, and try to find a couple of less common non-mutated, non-inflected words that are in the myspell list – you will likely get no hit (I have just done this, for instance, with “perffeithiadwy” and “syndrom”, to take two at random). This is because the myspell list contains far more items than the 13,000 citation forms I have in my dictionary so far.

An additional point, of course, is that the myspell list cannot provide the meanings and POS information in the dictionary, since it does not include them in the first place. So what Kevin Scannell said originally in his first post was perfectly accurate – this work was indeed done “from scratch”, because (sadly) no such material is available in Welsh under a free license, even though it would be a tremendous help to the Welsh language to have it so.

The entries in my dictionary are each tagged, because it’s useful to be able to give some indication of their provenance. The “5,000 most common words” work (tagged as wl1 – working list 1 – in the dictionary) equates to 3,852 entries (the reduction is due to data cleaning), which is a mere 31% of the total citation forms in the current version of Eurfa.

So the moral of the story is: don’t forget your READMEs! If they are intended to give an overview of the project, keep them up-to-date, and commit revisions as you go. I’ve now revised the one for Gramadóir-cy, to ensure there is no future misunderstanding.

Klebran released at last!

February 16th, 2007 by donnek No comments »

I’ve finally announced the release of Klebran. In October last year I was trying to think of a way of using the Welsh port of Gramadóir in some sort of GUI, because I just find it really difficult to use something at the terminal only. I think it must be because it makes me think about things serially, whereas I prefer to get more of a multi-faceted view of things. Anyway, I couldn’t find any tame C++ programmers, so I began to wonder about doing something in PHP. Hmm, probably won’t work, I thought – but I wrote a page to print output from Gramadóir to the browser, and (to my surprise) within a day I had something working.

That was the easy bit – the last 4 months have been down to improving it, and there are still lots of things that are not done “properly”. I’m sure the code could be improved, and I still think I’m not using AJAX properly! I originally used the –api switch on Gramadóir, but Kevin Scannell pointed out that using the –xml switch would give me a lot more of the info I use to do the neat mouseovers giving you meanings (from Eurfa) and part-of-speech info. The Elixir grammar-checking backend for Sonnet in KDE4 will use the –api output, though.

There are still improvements to be made in the Gramadóir port, particularly as regards disambiguation. For example, if you get a verb-form that is tagged as two separate parts-of-speech (eg gweler, newch), both forms will appear in the Gramadóir output, and the Klebran code will just ignore them as it does its regexing, so you get a space where the word should be!

But at least it’s a start – it’s surprising how many typos Klebran picks up even in text that has been eyeball-checked a couple of times, and I am pretty good at spotting typos. The next step is to combine Klebran with an importer for PO-files, so that I can take completed files and check they are using “standard” words, aren’t producing obvious typos, and so on. I’ll probably use the import part of Kartouche for that, suitably amended.