Sunday Spam: bagels, lox and smoked salmon

In belated honour of my breakfast in New York, Sunday July 8.

Baby Loss and the Pain Olympics
Warning for baby loss discussion.

I really have to question why seeing someone else processing their emotions is her pet peeve.

Do I believe a miscarriage and neonatal death is the same thing — of course not. If they were the same thing, they would share the same term. But just because I see them as apples and oranges doesn’t mean that I don’t also see them as fruit. They are both loss.

The deadly scandal in the building trade

Readers would not guess from the “national conversation” that the construction industry is sitting on a story as grave in its implications as the phone-hacking affair – graver I will argue. You are unlikely to have heard mention of it for a simple and disreputable reason: the victims are working-class men rather than celebrities… The construction companies could not be clearer that men who try to enforce minimum safety standards are their enemies. The files included formal letters notifying a company that a worker was the official safety rep on a site as evidence against him.

On Technical Entitlement

By most measures, I should have technical entitlement in spades… [and yet] I am very intimidated by the technically entitled.

You know the type. The one who was soldering when she was 6. The one who raises his hand to answer every question–and occasionally try to correct the professor. The one who scoffs at anyone who had a score below the median on that data structures exam (“idiots!”). The one who introduces himself by sharing his StackOverflow score.

Puzzling outcomes in A/B testing

A fun upcoming KDD 2012 paper out of Microsoft, “Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained” (PDF), has a lot of great insights into A/B testing and real issues you hit with A/B testing. It’s a light and easy read, definitely worthwhile.

Selected excerpts:

We present … puzzling outcomes of controlled experiments that we analyzed deeply to understand and explain … [requiring] months to properly analyze and get to the often surprising root cause … It [was] not uncommon to see experiments that impact annual revenue by millions of dollars … Reversing a single incorrect decision based on the results of an experiment can fund a whole team of analysts.

When Bing had a bug in an experiment, which resulted in very poor results being shown to users, two key organizational metrics improved significantly: distinct queries per user went up over 10%, and revenue per user went up over 30%! …. Degrading algorithmic results shown on a search engine result page gives users an obviously worse search experience but causes users to click more on ads, whose relative relevance increases, which increases short-term revenue … [This shows] it’s critical to understand that long-term goals do not always align with short-term metrics.

Angels & Demons

One of the various Longform collections, and like many of them, a crime piece:

On June 4, 1989, the bodies of Jo, Michelle and Christe were found floating in Tampa Bay. This is the story of the murders, their aftermath, and the handful of people who kept faith amid the unthinkable.

On Leaving Academia

As almost everybody knows at this point, I have resigned my position at the University of New Mexico. Effective this July, I am working for Google, in their Cambridge (MA) offices.

Countless people, from my friends to my (former) dean have asked “Why? Why give up an excellent [some say ‘cushy’] tenured faculty position for the grind of corporate life?”

Honestly, the reasons are myriad and complex, and some of them are purely personal. But I wanted to lay out some of them that speak to larger trends at UNM, in New Mexico, in academia, and in the US in general. I haven’t made this move lightly, and I think it’s an important cautionary note to make: the factors that have made academia less appealing to me recently will also impact other professors.

Ethics, Culture, & Policy: Commercial surrogacy in India: A $2 billion industry

Since its legalization in 2002, commercial surrogacy in India has grown into a multimillion-dollar industry, drawing couples from around the world. IVF procedures in the unregulated Indian clinics generally cost a fraction of what they would in Europe or the U.S., with surrogacy as little as one-tenth the price. Mainstream press reports in English-language publications occasionally devote a line or two to the ethical implications of using poor women as surrogates, but with few exceptions, these women’s voices have not been heard.

Sociologist Amrita Pande of the University of Cape Town set out to speak directly with the “workers” to see how they are affected by such “work.”

More falsehoods programmers believe about time

Noah Sussman has Falsehoods programmers believe about time, including:

All of these assumptions are wrong

  1. There are always 24 hours in a day.
  2. Months have either 30 or 31 days.
  3. Years have 365 days.
  4. February is always 28 days long.
  5. Any 24-hour period will always begin and end in the same day (or week, or month).

As is usual with these kinds of things, he’s only scratching the surface (even though there’s a lot more than in that excerpt). Andrew and I came up with several more already, on the subject of timezones:

  1. All timezones are vertical lines around the globe evenly spaced in 15 degrees intervals.
  2. All timezones are a whole number of hours offset from UTC.
  3. All timezones are no more than 12 hours offset from UTC.
  4. Two cities within some sufficiently small distance must be in the same timezone.
  5. Two cities with the same longitude must be in the same timezone.
  6. A city further to the east of another city must have a time ahead of or equal to the more western city.
  7. There will only be one timezone within any political boundary.
  8. Within a sufficiently large political boundary, there will be different timezones.
  9. Timezone designations like ‘EST’ are unambiguous.*
  10. Daylight savings shifts occur on the same day around the globe.
  11. Or at least within a hemisphere.
  12. Or at least within a continent.
  13. Or at least within a nation.
  14. Daylight savings shifts occur on predictable dates announced ‘sufficiently far’ in advance that there can be an exhaustive listing of them accurate for the next couple of decades.
  15. Well, at least the next few years.
  16. OK, surely at least this month?

* Both Australia and the United States call their east coast timezone this in winter, and guess what: it’s never the same time in New York as it is in Sydney, and the daylight savings status is seldom the same either. (If you’ve seen Australians call it ‘AEST’, well, yes, we do. Sometimes.)

Useful LaTeX packages: linguistic examples

This is the conclusion of a short series of entries on LaTeX packages I found useful while preparing the examination copy of my PhD thesis.

Today’s entry is a package for displaying linguistic examples (ie, samples of text which you then want to discuss and analyse).  The LaTeX for Linguists Home Page is a good general resource for linguists and computational linguists using LaTeX. I discuss gb4e here because I had to do some messing around to get it to display example numbers the way I want (and the way my supervisor wanted: he likes in-text references to look like “example (4.1)” rather than “example 4.1”), and to get it to work with cleveref, and no one seems to have written that up to my knowledge.

gb4e

gb4e is a linguistic examples package.

usepackage{gb4e}

Input looks like:

begin{exe}
ex This is an example sentencelabel{example}
ex This is another example sentence.
end{exe}

This is a cleveref reference to cref{example}.
This is a normal reference to example (ref{example}).

You can mark sentences with * and ? and so on:

begin{exe}
ex[*] {This is an sentence ungrammatical.}
ex[?] {This is an questionably grammatical sentence.}
end{exe}

You can do sub-examples:

begin{exe}
ex This is an example.
ex
begin{xlist}
ex This is a sub-example.
ex This is another sub-example.
end{xlist}
end{exe}

A few things to do to make gb4e play really nicely. First, some cleveref config. gb4e doesn’t yet automatically tell cleveref how to refer to examples, so you need to tell it that the term is “example”, and second, if you want braces around the number (“example (1.1)” rather than “example 1.1” you need to tell it to use brackets:

% tell cleveref to use the word "example" to refer to examples,
% and to put example numbers in brackets
crefname{xnumi}{example}{examples}
creflabelformat{xnumi}{(#2#1#3)}
crefname{xnumii}{example}{examples}
creflabelformat{xnumii}{(#2#1#3)}
crefname{xnumiii}{example}{examples}
creflabelformat{xnumiii}{(#2#1#3)}
crefname{xnumiv}{example}{examples}
creflabelformat{xnumiv}{(#2#1#3)}

Also, by default, the gb4e numbering does not reset in chapters. That is, your examples will be numbered (1), (2), (3) etc right through a thesis. You probably want more like (1.1), (1.2), (2.1), (2.2), ie chapter.number. Change to this with the following in your preamble:

% Store the old chapter command so that
% our redefinition can still refer to it
letoldchapterchapter
% Redefine the chapter command so that it resets the
% 'exx' counter that gb4e uses on every new chapter.
renewcommand{chapter}{setcounter{exx}{0}oldchapter}

% Redefine how example numbers are shown so that they are
% chapter number dot example number
renewcommand{thexnumi}{thechapter.arabic{xnumi}}
You could also get it to reset in sections by replacing chapter and thechapter with section and thesection in the above.

Thanks to the TeX Stack Exchange community for their help with this. See Section based linguistic example numbering with brackets for more information.

Useful LaTeX packages: within document references

This is part of a short series of entries on LaTeX packages I found useful while preparing the examination copy of my PhD thesis.

Today’s entry is packages relevant to preparing within document references. These are both fairly new to me, although not absolutely now.

hyperref

This package turns cross-references and bibliography references into clickable links in your output PDF (at least if you generate it with xelatex or pdflatex), without you having to do anything other than the ref (or cleveref’s cref) and cite and so on commands.

usepackage{hyperref}

You will probably want to modify its choice of colours to something more subtle:

usepackage[citecolor=blue,%
    filecolor=black,%
    linkcolor=blue,%
    % Generates page numbers in your bibliography, ie will
    % list all the pages where you referred to that entry.
    pagebackref=true,%
    colorlinks=true,%
    urlcolor=blue]{hyperref}

Use black if you want the links the same colour as your text.

One note with hyperref: generally it should be the last package you load. There are occasional exceptions, see Which packages should be loaded after hyperref instead of before?

cleveref

cleveref is a LaTeX package that automatically remembers how you refer to things. So instead of:

see chapter ref{chapref}

you use the cref command:

see cref{chapref}

It handles multiple references nicely too:

see cref{chapref,anotherchapref}

will generate output along the lines of “see chapters 1 and 2”.

Use

Cref{refname}

to generate capitalised text, eg “Chapter 1” rather than “chapter 1”

To use it:

usepackage{cleveref}

It shortens the word “equation” to “eq.” by default, if you don’t like that, then:

usepackage[noabbrev]{cleveref}

For some packages that don’t yet tell cleveref how to refer to their counters, you will get output like “see ?? 1” rather than “see example 1”. You use the crefname command in the preamble to tell it what word to use for each unknown counter, examples of crefname will be shown tomorrow for gb4e.

Useful LaTeX packages: tables and figures

This is part of a short series of entries on LaTeX packages I found useful while preparing the examination copy of my PhD thesis.

Today’s entry is packages relevant to preparing tables or figures. Again, some are pretty widely known and some aren’t.

rotating

If you have a big table or figure that should be rotated sideways onto its own page:

usepackage{rotating}

And then you can replace the table and figure commands with:

begin{sidewaystable}
%Giant table goes here
end{sidewaystable}
begin{sidewaysfigure}
%Giant figure goes here
end{sidewaysfigure}

dcolumn

The dcolumn package produces tabular columns that are perfectly aligned on a decimal point (ie all the decimal points in that column are exactly underneath each other), which is usually how you want to display decimal numbers.

usepackage{dcolumn}

% create a new column type, d, which takes the . out of numbers, replacing the .
% with a cdot and aligning on it.
newcolumntype{d}[1]{D{.}{cdot}{#1}}

Now that you have defined the column type, you can use d in the tabular environment, where the numeric argument is the number of figures to expect after the decimal point. You don’t have to use exactly that number of figures in every entry, just that that’s how much room it will leave.

% a tabular enviroment with a 1 and 3 figures after the decimal point column
begin{tabular}{d{1}d{3}}
1.6 & 1.657
\
2.0 & 6.563
\
7 & 6.26
\
end{tabular}

One annoying aspect of this package is that for the headers of that column, which probably aren’t numbers, you will need to use multicolumn to get them to display nicely.

% a tabular enviroment with a 1 and 3 figures after the decimal point column
begin{tabular}{d{1}d{3}}
multicolumn{1}{c}{Heading 1} & multicolumn{1}{c}{Heading 2}\
1.6 & 1.657
\
2.0 & 6.563
\
7 & 6.26
\
end{tabular}

You can mix the d column type with the usual l, r and p column types.

threeparttable

You can’t use footnote in a floating table. This is one of several packages that allow table footnotes in various ways.

usepackage{threeparttable}

threeparttable doesn’t cause tables to float on its own, so you usually want to wrap in a table command:

begin{table}

begin{threeparttable}

% Normal bits of your table go here, and use tnote{a} and
% tnote{b} and so to generate a note mark

begin{tablenotes}
tnote General note
tnote General note 2
tnote[a] Note for mark a
tnote[b] Note for mark b
end{tablenotes}

end{threeparttable}

caption{Caption goes here}
end{table}

Unfortunately you need to generate the a, b, c (or whatever) numbering manually.

The general tnote entries are useful for things like “Bold entries are highest in the column”, so that they don’t need to go in the caption.

Useful LaTeX packages: bibliography

I’m going to post a short series of entries on LaTeX packages I found useful while preparing the examination copy of my PhD thesis. Largely this is just so that there’s a reference if my wiki page goes away, but also because I think many people use LaTeX the way I use it, that is, I got wedded to a bunch of packages 10 years ago and never really looked around for more recent stuff.

Today’s entry is a pretty slow start: the bibliography packages I used are pretty standard.

natbib

This is one of the most sophisticated and widely used packages for Harvard-style references (ie, “(Surname, Year)” rather than “[1]” style references).

usepackage[round]{natbib}
bibliographystyle{plainnat}

Inside your text use citep for a reference in parentheses “(Surname, Year)”, and citet for a in-text reference “Surname (Year)”. Its important to note that the plain cite command is equivalent to citet, which you may not expect.

You can use citeauthor to get just “Surname” and citeyear to get just “Year”.

bibentry

This is a useful add-on to natbib, which allows you to insert full bibliography entries into the body of your text. This is useful in the declaration portion of a thesis (where you say something like “this thesis incorporates revised versions of the following published articles”).

usepackage{bibentry}
nobibliography*

Then later on when you want to insert a full bibliography entry into the middle of your text:

bibentry{citationkey}

Product review: Shoeboxed

Update February 2017: this service is now known as Squirrel Street, and their smallest monthly pricing is significantly higher than it was in 2012. However much of the review still applies.

Original review:

I’ve been using Shoeboxed now for long enough to review it, I think.

Problem: as with every adult household, we have lots of incoming documents like bills and super statements and similar, and the high initial overhead on deciding whether and where to store them, plus re-sorting them later and so on has never been something we’ve been on top of. Come tax time, in particular, we were usually opening piles of envelopes and hoping for the best.

In 2007 or 2008 we started scanning and shredding a lot of things, but that still left going through and labelling the scans as a problem, plus when I went on maternity leave in 2010 we didn’t have access to a sheet-feed scanner anymore and got behind and never caught up. Back to the “giant unsorted pile of paper” solution.

There are a few services that accept mail on behalf of people and send scans (Pass the Post, Keeping You Posted) but these tend to be quite expensive if you want them to handle all your mail, and also there’s still a time-critical decision step (scan it or send it to me). It tends to be aimed at travellers or businesses. It was annoying enough though that every few months I hit the search engines and eventually lit on Shoeboxed.

What Shoeboxed does:

  1. accepts documents either sent by mail (not one at a time, many in a big envelope) to a US or AU postal address, or uploaded
  2. scans the physical document if any
  3. does data entry for the major data within (for bills, say, the sender and the total)
  4. makes them available after logging in on their website
  5. makes them available over an API to other services like bookkeeping websites

What Shoeboxed doesn’t do:

  1. directly accept individual physical mail on your behalf (they do have a service where you can get online receipts sent to them, I haven’t used it)
  2. full OCR of the scanned documents

There’s a very very limited Free plan involving uploading (not mailing) up to 5 documents a month for OCR plus unlimited uploads if you do your own data entry. The next plan up in Australia, which we’re on, is $20 a month, and includes all the features I listed

Impressions:

  1. overall, it pretty much does what we want: gets paper out of our house and into an easily searchable online form with scans available
  2. because it isn’t fully OCRed I still have to go through non-bills in order to note what they are, eg, a mail from childcare could be a fee change or a newsletter or a note about illness and if I need to find it in a year I’d have to search on the name and look through them all
  3. the processing speed on the Lite plan (contents of envelopes appear on the website in 3–5 days) has been a bit annoying on occasion, I’ve found myself scanning really time-critical documents and uploading them
  4. the processing speed on uploaded scans is great, the data entry is usually done within the hour
  5. the usage reporting doesn’t incorporate the bonus scans one gets by doing things like signing up for an annual plan, or answering demographic surveys. Very annoying!

For our needs, it’s definitely an improvement over our home-rolled solution. We’re scrambling to get 250 documents to them before our annual purchase bonus expires.

Connecting a Debian/Ubuntu server to the Macquarie University OneNetAnywhere VPN

I realise that this is a rather specific problem, but hopefully the links I provide here will be useful for anyone wanting to access a PPTP VPN themselves.

I have to say that this is one of those entries more likely to be useful if you ever have this specific problem (eg, you can here via a search engine query for “argh pptp mppe errors argh argh argh”) and less for a casual reader. Apologies loyal fans!

Continue reading “Connecting a Debian/Ubuntu server to the Macquarie University OneNetAnywhere VPN”

On being X-ish

Now that I have described how I graduated into Generation X, I have a secret to confess: I’m starting to think that that might not be entirely wrong.

Let’s stick to cohort effects here, since it’s supposed to be a cohort term. And I should add that this is all very trivial stuff, I’m focussing on media, pop culture and technology experiences.

One of the major temptations of identifying as Generation Y had to do with pop culture. My teenage years were just past the wave of slackers and grunge and Seattle. I probably heard Nirvana’s music during Kurt Cobain’s lifetime, but I didn’t know of them as a thing until about a year after he died. I’ve never even seen Reality Bites, but Ethan Hawke and Winona Ryder are both 10 years older than I am, and their movies weren’t about my cohort.

I am, frankly, Spice Girls age: not the pre-teen thrilled girls waving things to be signed, but the teenagers who actually paid for the albums with their own money. (I didn’t, for reference. We were a Garbage family.) Britney Spears was born in the same year as me, and her biggest year career-wise was my first year of university. And obviously, when the term “Generation Y” was coined, the stereotypes of late university/early career certainly fit my friends better than the Generation X tags with managerial aspirations. The return of cool people listening to cheesy pop: Y-ish. So that was where I felt I fell. (In case anyone I knew at high school drops by: I realise I wasn’t cool. But you may have been, and don’t think I didn’t notice you danced to the Spice Girls.)

But then, there’s certainly a few small societal boundaries between me and people who were born in 1986. (I have a sister born in 1986, and thinking about the five years between us is often telling.) Starting at a global level, I was reading Tony Judt’s Postwar recently (recommended, I’ll come back to it here at some point), and I was struck because I remember 1989.

To be fair, that’s more important if one lives in Europe, which I never have, but most of my first detailed memories of newsworthy events have to do with the revolutions of 1989 and the 1990 Gulf War. I remember the USSR, again, from the perspective of a young child who was growing up in Australia, but still. I can read the science fiction people smirk about now, the fiction with the USA and USSR facing off in 2150, and remember, a little bit, what that was actually about. This is, well, frankly, more than a little X-ish.

While we’re talking about defining events, I recall that quite a lot of people talked about the children who won’t remember 9/11. (And by children, I now mean 15 year olds, of course.) Obviously this is more important in the USA, perhaps a little like the European children (by which I mean 25 year olds) who don’t remember 1989 in Europe. I obviously remember 2001, and moreover remember the geopolitical situation in the years before it quite vividly too, and that latter is again, more than a touch X-ish.

Turning to technology, which is fairly defining for me, we’ll start with Douglas Adams:

Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it. Anything invented after you’re thirty-five is against the natural order of things.

Leaving aside the age effect where shortly everything cool will be against the natural order of things, it’s noticeable to me that the Web and email and so on fall in the “can probably get a career in it” bracket for me. Well, obviously not truly (the first version of the SMTP specification, which still more or less describes how email works today, was published in 1982), but my late teenage years were exactly the years when suddenly a lot of Australian consumers were on the ‘net. Hotmail was founded when I was 15 and I got an address there the following year. (icekween@, the address has been gone since 1999 and I’ve never used that handle since, partly because even in 98/99 it was always taken. But, actually, for a 16 year old’s user name I still think that was fairly OK considering some of the alternatives.)

In short, it was all happening in prime “get a career in it” time for me, and not coincidentally I am at the tail end of the huge boom in computer science enrolments and graduates that came to a giant sudden stop about two years after I finished. Frankly, X-ish. My youngest sister and her friends didn’t get excited about how they were going to become IT managers and have luxury yachts as a matter of course. (Well, partly age and partly not being jerks, there.) It’s a lot harder to get the “just a natural part of the way the world works” people excited about it.

Diagnosis: tailing X.

Name fields and UI

The AdaCamp Melbourne application form began with two fields: Your Name and Your Email. Seems fair enough! An unanticipated problem a few people have had with the forms is that they have entered “Your Given Name” and “Your Surname” instead, presumably trained to do this by umpteen million sites that want data entered that way. This leaves us with no email for them.

I don’t think the solution is to go with the flow, it buys into Falsehoods Programmers Believe About Names and the only thing it would get AdaCamp is the more-or-less correct alphabetising of the attendee list. (Only more-or-less correct: not only do given-surname name orderings vary among two-or-more-name cultures, so does the sort key, see eg Wikipedia’s manual instructing editors to sort Thai people by their given name.) But since we have no need to alphabetise the attendee list, it’s fine.

The best solution, I think, is to perform email address validation, which has its own problems (eg many validators use “is there a dot in the domain part?” which annoys the lucky people who have an email account at a top level domain no end) but gives us what we really need: a way to contact applicants!