French business name puns

11 juin 2010

It takes a long while living in France to be “in on the joke” when it comes to business name puns (at least it did for me!). They are pervasive. Some are kind of cute, like the national clothing store “Kiabi” (Qui habille ?); a lot are rather silly. In all cases, they provide a fun way to learn French! Here are a few examples I’ve run across recently:

  • O’net Sécurité
    This is a security services company. Get it? O’net = honnête. I guess this’d be something like “Honest Abe’s Security” in English! (On a recent visit to the Louvre museum I noticed that O’net has the contract for security services; to their credit, they were very helpful and kind to my mom who uses a wheelchair.)
  • ProdHyg
    This one is a “cleaning products and services”; a heavy duty chemical kind of cleaning outfit.  Pro+(d)+Hyg(iène) (i.e. “the Cleanliness Pros”). Pronounce “prodige”! I really like the incongruence of the soft sound and the harsh spelling!
  • CréaTif
    Virtually every hair salon has “tif” in the name (slang for ‘hair’). These can get very, ahem, creative!

My favorite though is AinPecc–a cleaning service in the department (or county) of “Ain”. Pronounce “impecc”, short for “impeccable” (’spotless’ in English).

Let me know if you discover any good ones!

“Much Better Case” scenario: alignment of well-formatted html text

13 mai 2010

OK, this is maybe not the absolute ”best case”, but its probably getting close. I found an interesting reference translation for my data base, in well-formatted, i.e. simple, html (no tables, etc.). For 4000 French words, total paragraph alignment time was 10 minutes. Roughly, we’re in the 20-30K words per hour range.

And again, I’m talking here about aligning EXISTING or FOUND translations. The translations you do yourself are already perfectly aligned if you follow my Step 1: Translate in a table!

Manual translation alignment times for PDF files

6 mai 2010

I just manually aligned a 7,000 French words to English translation, from 2 PDF files. This is about the worst case scenario for alignment, because you first have to pre-process the PDF texts by copy/pasting them to Word and then searching and replacing all the extraneous paragraph markers. Then, after creating a translation table as previously described, you have to go through and sort out all the tables, footnotes, etc. which are inevitably not aligned, as well the paragraphs properly speaking, which might be correctly aligned about 50% of the time, the rest requiring copy/pasting and table cell deletion, line insertion, etc. to align correctly.

Anyway, all this took about 1 hour and a quarter. So what is the cost per word… Hmm, let me see. Very roughly, I would say this costs between $0.005 and $0.01 per word, and as stated, this is about the worst case scenario possible for manual alignment. I would expect that a translation available in Word and properly/similarly formatted would cost less than a quarter of the worst case scenario.

I hope to post findings on a “best case” soon!

Accurate word count

3 mai 2010

When billing, I think it’s important to provide the client with an accurate word count, meaning numbers, proper names, etc. are not included in the count.

Unfortunately, MS Word’s word count function counts pretty much everything, including numbers. That’s why I’m developing a PHP script to tally up these “non translation” elements. Pretty easy using regular expressions to extract/count numbers, but proper names is a bit tricky, because obviously not all capitalized words are proper nouns. It’s coming along though and I hope to post something soon (if you know of an existing tool or script, please let me know!).

Most cumbersome part is getting PHP to “read” the text, which means basically having to save Word doc as text file that can be read by PHP. An extra step, but an important one I feel to avoid misunderstandings with client. The question remains though whether and how to bill these “non translation” elements, as they do take time to incorporate into translation.

Step 1 - Translate in a Table

7 janvier 2010

Everybody translates this way anyway, don’t they? That is unless they have some fancy software like Trados. Well, just in case, here’s what you’ll need to do:

  • Open your source document (Word or openoffice format, hopefully)
  • Select text
  • Click on table button, which will wrap your text in a one-column table in which each paragraph has become a row
  • Select column, then click “insert column to left” (or right, depending on how you want your language columns arranged–be sure to always keep the same arrangment, no matter which direction you are translating, otherwise your database will suffer greatly later!)

Now you have a two-column table to do your translation in. This is a nice way to work for two reasons: First, and most important, your table can be easily inserted later into your translation database; second, in Word at least, the formatting is maintained in the added column, so when you’re finished with the translation you just have to delete the source language column (after making sure to save 2-column table in a safe place, e.g. in a folder called “Add to database”!) and then do “table to text”, and presto! You’re translated document is identically formatted to the source language file. (Note that when numbering is used, with two columns, numbering will get funny, but once you delete the source column, it reverts back to the desired, orginal numbering–again, that’s true for Word; I’m not sure about OO.)

So there you have it. These translation tables are what you will later insert into your MySQL database, after some minor transformations using Word macros to be provided here later. And of course if you have reference translations (or old translations of your own) that you want to prepare for insertion to the database, you can certainly guess at how to combine the two language files into one table, with a few table manipulations here and there to ensure perfect alignment between paragraphs–but more about this later.

Finally, note that I will always refer to two languages and two-column tables, but of course this figure could be 3 or more, depending on how big a polyglot you are and whether you have documents in 3 or more languages.

Happy translating (and tables) to you!

Image PDFs

4 janvier 2010

Scanned image PDFs (i.e. where text can be neither copied nor converted, except with OCR software) can be a nightmare for translators!

Besides the obvious problem of having to fully recreate the document (formatting, tables, etc.), you cannot create a translation table from it, and thus neither translator nor client can benefit in the future from all the hard work put into the translation. Plus, in longer documents, the translation will certainly be less consistent due to the impossiblity of text searching within translation to see how term X was translated above (e.g. am I using the same term on page 35 that I used on page 5? It’s hard to be sure when working between paper or “image file” and translation “Word” file.)

Clients, avoid creating such documents! Translators, avoid accepting such files… if you can!

Searchable corpus online and working

27 décembre 2009

As an example of what you can download and/or create yourself, here is my searchable corpus made up of publicly available texts. A large chunk is the French>English translation of Jules Verne’s  “Voyage to the Center of the Earth” (thank you Project Gutenberg), but there are also international treaties and a number of European and Canadian government documents. Have a look and let me know what you think.

Go to page “Search in sample corpus”…