Citation delusions: "The most influential paper Gerard Salton never wrote"

In trying to finalise my PhD revisions, I am giving some background on text categorisation.

Extremely briefly, the problem of text categorisation is this: you have a document and some (usually pre-defined, unless you’re clustering) categories. For example, the categories might be news and editorial. Or academic article, newspaper article and blog entry. The choice of categories is application dependent.

Then you have a document you wish to assign to a category. Is it news, or editorial? The typical way of doing this is to assemble a set of training examples: pre-assigned news and editorial pieces. Then you measure the similarity of your new document to the pre-assigned collections, and whichever category it is most like is your document’s category. You might notice that I have not here defined “measure the similarity” and “most like”: that’s often the research question. How can you represent the collections efficiently so that they can be compared against new documents? What are good measures of similarity?

A fairly common way to picture this is (for historical reasons, as we’ll see), a vector. For each word in the vocabulary (the vocabulary being the set of terms used in every document in the training examples, typically, sometimes you might try and smooth the morphology out or similar), you construct a numerical representation. Say the vocabulary is no-good, bad, rotten, and a document reads “no-good no-good bad”, you might describe it as a vector , showing two uses of the first vocabulary item, 1 of the second and none of the third. (Again, whether you count vocabulary items, or weight them in various ways, is a research question. You may also notice that this counting-of-occurences model is a “bag of words” approach, that is, it does not distinguish between “bad rotten” and “rotten bad” even though in language word order and syntactic structure is meaningful. It’s possible to transform the vectors so that this orthogonality of individual words does not hold.)

For reasons that I won’t go into here, I am trying to discuss this model briefly in my PhD thesis — actually, more briefly than I did above — and therefore looking to cite the originator of the idea. I started coming across citations in other papers that looked something like: “Gerard Salton [and others] (1975). A vector space model for information retrieval.” Sounds good. It’s got the key words in it, and quite a few citations!

I like to sight before citing though, which means I found this interesting paper:

David Dubin (2004). The Most Influential Paper Gerard Salton Never Wrote, Library Trends 52(4):748–764.

Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled “A Vector Space Model for Information Retrieval” (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specific computations. Citations to the phantom paper reflect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was first proposed as an IR model.

Naturally such a subtle treatment of the history of the model is not great for my immediate purposes: I need That One Citation! (As best I can tell from Dubin, if I have to pick one it should be G. Salton, (1979). Mathematics and information retrieval. Journal of Documentation, 35(1), 1–29.) but it’s fun to come across the analysis of an idea in this form.

Update: if you want a reasonable overview of text classification/topic classification/topic assignment, the survey of choice seems to be Fabrizio Sebastiani (2002). Machine learning in automated text categorization, ACM Computing Surveys, 34(1):1–47. You know, modulo 11 years now.

Kin: the director's commentary

Some additional or rephrased thoughts on kinship, now that I’m not working to a deadline, nor as tired.

Family of origin. There’s luck and privilege involved here, both in having a family of origin you want to have a relationship with, and that wants to have a relationship with you. When there’s choice about it, it’s most often only available to legal adults, at least when it comes to a relationship with one’s parents. I choose to have strong, central relationships with my family of origin, this is active work but work I can only do due to luck and privilege.

Family of choice. There’s some more freedom here, but not everyone goes into life or even into adulthood with the skills, time, energy, mindset etc to form strong bonds with unrelated people. I’ve reached 30 without really doing enough work here.

Children. In talking about the legal relationship, I’m assuming a fair bit more luck and privilege: that is, reproductive control. In that case, one is not legally required to have children in the same way that children must have parents/guardians. In either case, one’s relationship with one’s children is in an interesting place as something like family of origin (traditionally endorsed, legally recognised[1], tightly societally scripted) and like something else: a very intimate relationship with someone you don’t know at all and who will change quite a lot, very quickly. And in some circumstances, one can choose this relationship without any input from the other party (unlike being assigned your parents, in which one usually has no choice, or being assigned your siblings, in which neither sibling typically has any choice). It’s a very odd thing.

So for me my relationship with my child is somewhat outside the “family of origin”/”family of choice” narratives that I’d at least had a long time to consider. It’s not the only relationship one could have that has these features, but it’s the major one I’ve had, and I don’t know that I can write well about it until the story has played out more.

[1] Some luck and privilege here!

Kin: unchosen family as chosen project

Week 1 of the Alphabet Sufficiency: K.

Here’s a story about a ‘k’ word: my mother is a teacher, and once a child dobbed on another child for using “the k word”, and my mother asked that child “what k wor—?” and stopped herself too late. Yeah, that k word.

It’s not explicitly against the rules of this project to discuss the project, so let me note that my first choice for the ‘a’ word was going to be adulthood, until I realised I could stuff kinship and adulthood into the one piece and leave the ‘a’ field wide open for acid, acne (and/)or alcohol.

There are a lot of paths to adulthood, as I had cause to reflect on a few years ago, reading Kate Crawford’s Adult Themes: Rewriting the Rules of Adulthood, questioning the association of Australian adulthood with owning a home, having a stable job, and having children. I am pretty sure that at the time I’d had an apartment mortgaged for several years, and was pregnant. I was staving off adulthood with a PhD enrolment. See also Yatima:

None of which has anything to do with Kate [Crawford] except that she takes the set of prejudices and preoccupations I associate with people of Keith [Windschuttle]’s generation: real estate, marriage, children and so on; and deconstructs them as inadequate and meretricious cultural markers for adulthood. She is especially wry on the punitive economic structure of Australian society. It has become very, very difficult for young people to buy property, but in a home-ownership-obsessed society renters are considered sort of frivolous. Psych! Kate argues for replacing these shallow rites of passage – the excruciating wedding, the adjustable-rate mortgage – with a far more nuanced appreciation of modern adult lives, where for example your jati might take the place of a nuclear family.

It’s a terrific book, and it made me think pretty hard about how deeply I absorbed old-fashioned Australian prejudices without even realizing that I had done so. I loathed Sydney’s consensus reality while I lived there, but as soon as I got to San Francisco I got married, bought a house and squeezed out a couple of kids.

There are a lot of paths to adulthood, and I also chose that one, or it chose me, or I didn’t choose and that’s the one you end up with when you don’t choose.

I have that not-uncommon affliction of never having had an enormous amount to do with youth culture and thus feeling like I was about 30 from the time I turned 18. (I spent enough time at 16 and 17 in licenced 18+ venues that I can’t claim it any earlier.) Early in my PhD, so when I was 25 or 26, a slightly younger fellow student waxed lyrical about the joys of postponed adulthood: “I still feel like a kid, really!” I left home when I was 17, established an actual household at 19 (the economic incentives to form incredibly young de facto marriages via the same pressures that create sharehouses is an interesting footnote to modern marriage and partnership), and last took money from my parents that wasn’t a present some time before my 21st birthday. At the time of the conversation I was party to a mortgage, and I think also a marriage of the “solemnly swear” type. I felt nothing like a kid at that time, and I certainly haven’t recaptured it since.

Dominant narratives might not be necessary easy to live as such, but because I tick enough boxes (being straight is coming to mind, in particular) it is a reliable groove and doesn’t leave too many bruises. Lucky me, indeed.

I often appear to people to be a more driven person than I in fact am, because I work quite hard. From the outside, it’s easy to mistake working hard for being goal-driven. In fact I tend to find a plausible project, to date almost always chosen by someone else, and then carve a niche for myself in there and thrive, or not. The one main life project that I have chosen is kin itself: the creation and maintenance of family.

There are more ways to do this than to be related to people, or to have children, but again, I am going with the dominant narrative. My phone allows me to have people “bookmarked” for easier dialling. Those people are my husband, each of my parents, and my two sisters. I didn’t even realise that that said anything about me for about four weeks. I ring my parents on Sundays, which is when my parents both rang their mothers when they were alive. The tradition goes at least one generation further back on my mother’s side and quite possibly two.

I still make it sound rather like I am following a default path here, but adhering to a tradition is still work, and for once I’ve planned it, or rather, chosen it. But where the real decision comes in for me is being a parent. I have a kid (not so much squeezed out as hauled out with tongs under considerable protest, as it happens). This is a scary process while one prepares for it: how do you get ready to have new family? It’s not the family the law encourages you to have, it’s not chosen family, it’s brand new manufactured random family. And then you have to teach them to, among other things, care about your perspective, and your ability to feel pain, and your desire to sleep.

So this is my big, meaningful work, as someone to date better at the work part than the meaningful part. It’s not a very creative choice, and I don’t like it to be asserted as normative, but here it is. When I set up speed dial, I set up my family of origin, and when I planned for the future, I had a child. If I had to choose family, I don’t know where I’d start, but I’ve chosen to work on family.

The Alphabet Sufficiency

I was a little bit surprised, to be honest, when my writing tips (such as they are) resulted in the rapid creation of the Alphabet Supremacy, a year long writing project, for which Jonathan Lange and Bice Dibley will each write around 30000 words:

Here’s how it’s going to work:

  • Week 1, jml picks a word starting with ‘A’, we both write something
  • Week 2, Bice picks an ‘A’, we both write something
  • Repeat 24 more times, skipping ‘X’ because no one likes xylophones and because we want Christmas off

I was also frankly jealous because I love extended written conversations (even as loose a conversation as this). But I also wasn’t invited to play, dammit, and in any case a year-long commitment is not the thing for me right now. If the amount of personal change and variability of energy levels I experienced in 2012 continues I will be living in a leper colony on the Moon by December 2013.

However, Martin Pool has kindly agreed to a modified version, which I call the Alphabet Sufficiency. The fundamental idea is the same: once a week, one of us nominates a topic for a pre-chosen letter, and both of us write 600+ words on that topic, and then the other chooses a topic the following week. However, given the whole Moon situation, it will be a six week project, not a year-long one. We have therefore modified the rules as follows:

  • six letters have been chosen by an arcane and complicated process (see end of post), those letters being K, M, A, V, F and I (note that while X was not selected, it was eligible)
  • letters won’t repeat for two weeks: Mary will choose the K, A and F topics and Martin the M, V and I topics, but both authors will write a post for each of the six topics
  • words already chosen by the Alphabet Supremacy up to that point are not allowed as topics, although with some good faith allowance for race conditions (and as a humble copycat project, we impose no restrictions on the Alphabet Supremacy re-using our topics)
  • the first post is due before Thursday February 14 midnight Sydney time (as in 0000, so the midnight between Wednesday and Thursday!), and then weekly thereafter for 6 weeks

The ‘k’ word will be ‘kin’.

$ python
>>> import random, string
>>> random.sample(string.ascii_lowercase, 6)
['k', 'm', 'a', 'v', 'f', 'i']

Why my phone is silent during LCA talks

I don’t especially like Tasker’s interface, but setitng one’s phone to silent is nice enough to bust it out, so I thought I’d explain how I do this during linux.conf.au.

A bit of background: Tasker is an Android application (not free in either sense of the word) that does things to your phone when certain conditions (called contexts) are true. For example it could change the wallpaper (task) when you have unread text messages (context). I have, for example, Tasker tasks that turn my phone to silent between 10:30pm and 7:30am local time; and to run rsync backup (which copies the contents of my phone to my home server, ie backs it up) every time it is both on power and connected to my home wireless network.

Tasker somewhat trades between UI simplicity and power in favour of power (although even then I think there are better possible UIs for it). You can generally find specific apps that do individual Tasker-like things (for example, I would not be surprised if there was a ‘Silent at Night’ app), but Tasker lets you specify a wide variety of contexts and tasks.

First: the LCA calendar iCal is in my Google calendar, so it’s available to Tasker through its Calendar contexts. So that’s prior to setting this up.

The basic setup would be this:

  1. Go into Tasker.
  2. Add a Context (called eg ‘LCA activities’), select ‘State’, ‘App’, ‘Calendar Entry’.
  3. In Calendar Entry, go down to Calendar, press the search icon, select your LCA calendar.
  4. Press the tick.
  5. Now it will prompt you for the task, which is silencing your phone. Select ‘New Task’. Name the task (‘Silence’): it might be useful for other contexts!
  6. Press + to add an action. Select ‘Audio Settings’ and then ‘Silent Mode’. Turn ‘Mode’ to ‘On’. Leave ‘If’ alone. Press tick to approve the action and then tick to approve the task.

After this teeny (ahem) amount of work you now have a Tasker task that silences your phone during any event on the LCA calendar.

Fine print

My setup is a bit more complicated than this because I thought ‘wait, I want my phone to ring during meals’. This is a pain in the neck to do.

I added a second Context (long hold on the existing context), another Calendar Entry, also on the LCA calendar, but I also searched for location, selected ‘MCC Foyer’ (which is where the morning and afternoon teas are) and selected the Not tickbox, to make it a negative context. The total effect is that when there’s an event in the LCA calendar AND when there’s not an event in the LCA calendar that is in MCC Foyer, the task triggers. But that’s quite a bit nastier.

It can end up being easier to have a calendar that amounts to a ‘Do Not Disturb’ calendar, which isn’t ideal. Some people do something like “silence during anything in my work[/personal] calendar that’s marked busy”, etc etc, which would be longer lived than my LCA recipe. BUT at least my LCA recipe buys us silence for this conference!

Creative Commons License
Why my phone is silent during LCA talks by Mary Gardiner is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Teach me py.test (Haecksen miniconf, Tuesday)

I didn’t manage to go speaking-free for LCA2013 after all, because I have volunteered to help out my roommate Brianna Laugher with the py.test presentation in the Haecksen miniconf.

The plan is that we will do “Teach me py.test” along the lines of Steve Holden’s “Teach Me Twisted” session at PyCon 2008 (see Catherine Devlin’s report). The idea of the session is that I (genuinely new to py.test, although not to either Python or to unit testing in general) will hook my laptop up to a projector and learn how to write tests in py.test, with Brianna teaching me.

We have pulled some of the business logic out of Zookeepr into this git repository in preparation for the talk at 16:05 in MCC6. I am not sure how much we will cover in 25 minutes, presumably not a lot, but it should be an interesting experiment in presentation style.

Cooking notes: mud cake

I made a birthday cake for my son yesterday.

Turntable!

The basic recipe was this mud cake recipe from Taste. Modifications:

  • we’re a three person household, so I cut the recipe in half, give or take (about 2/5ths, mostly, because I used 100g of Lindt 70% Dark chocolate)
  • I used cream, not sour cream. I am intrigued by sour cream in a chocolate cake I have to say, but Andrew doesn’t like sour cream and generally I didn’t want to mess with the sweet flavour of a kid’s birthday cake more than I was already doing by using this recipe

Since we have an espresso machine in the house I also pulled my second or third ever espresso (actually, a very long black at that volume!) which undoubtedly was terrible. I drink coffee very rarely; but of course I didn’t have to taste it directly, I just had to not get coffee grounds in the cake. The coffee taste went well in the cake and means not having to use a liqueur for a bit of zing.

The we established an ancient ritual of our culture:

Cake tasting

I cooked it in a $3 train mould from Kmart. All my love to you silicone cake trays: you’re a pain to get into the oven but a cinch to extract cooked cakes from. The “icing” is molten milk chocolate and the “decorations”, such as they are, are of a train turntable. I’m hoping to decorate cakes with a little more forethought as he gets older.

I cooked it for 20 minutes in a hot oven and another 5 minutes as the oven cooled and that was about right. If you like mudcakes slightly wet as I do 15 would probably work (for pieces of cake this size). The train pieces are a bit short: I wasn’t sure how full to fill the molds when it had self-raising flour in it and tried about 2/3, when I really should have gone nearly to the top.

While he got very excited about the sprinkles, my son actually ate half the undecorated engine with far more enthusiasm. It’s not really a cake for a toddler’s palate (and he totally has one, if everything in the world was made of butter icing he couldn’t be happier) but he didn’t seem put off.

Fun at LCA 2013: my picks for Thursday and Friday

Thursday

I rarely go to LCA’s tutorials, but really, after years of not having to worry too much about distributed version control systems due to having in-house technical support from my husband, a (now former) Bazaar developer, it’s probably time that I came to grips with git. Hence Git For Ages 4 And Up (Michael Schwern) is tempting, hopefully it’s OK for those of us who do use terms like “directed acyclic graph”. This does mean missing Wiggle while you work (Neil Brown) though: apparently you can’t be a git beginner whilst being interested in newfangled patching algorithms.

After lunch The IPocalypse – 20 months later (Geoff Huston) calls to me: it’s the sequel to his LCA 2011 keynote, which is the one that stood out to me. (Well, and Mark Pesce’s, yes, but funnily enough his actual content largely passed me by.) All that doom and gloom, and now what? Has IPv6 cost us our Internet?

A Tridge talk (Building a free software telemetry radio system) is an even more obvious pick than a Matthew Wilcox talk. (Although why did we put that particular talk up against Buffer Bloat? Tridge is going to talk about TCP performance issues.)

In the afternoon Keith Packard has a new passion (Teaching Robotics and Embedded Computing with Legos and Arduino) and then Ristretto: run-time types for JavaScript (Shane Stephens) sounds alarming. In a good way.

Friday

It might also be a two-tute LCA, with Beyond Alt Text: What Every Project Should Know About Accessibility (Denise Paolucci) up first. BUT NovaProva, or How I Did Six Impossible Things Before LCA (Gregory Banks) is the good crack (“NovaProva implements true reflection in C/C++”???), so… difficult!

After lunch, Asheesh Laroia’s Quantitative community management is closer to what I do but I am also curious about The real story behind Wayland and X (Daniel Stone). In the final session, probably Building Persona: federated and privacy-sensitive identity for the Web depending on how my conference energy is going.

And then where?

I’m headed back to the USA in March for PyCon, and I’m looking forward to having way (waaaaaay) less commitments than I did at Wikimania 2012, and therefore being able to catch more of the talks. And not dragging myself to my hotel room at 4pm to order crème brûlée room service because I am too tired to figure out how to work the lifts. (It was good crème brûlée though!) The Ada Initiative will probably be running some non-talk activities though, so it won’t be wall-to-wall talks. And then a second return to the USA for AdaCamp SF. And that really might be enough for one year, but if not, there’s always Kiwicon.

Fun at LCA 2013: my picks for Tuesday and Wednesday

I’m currently regarding LCA 2013 as my last LCA for a while. Never say never: LCA 2014 bids came in from Sydney (so, local to me) and Perth (where I’ve never been and would like to go). But I first went to LCA in 2001 and then later went to 2004 and since 2007 I’ve been to LCA every year, except for 2010 and that only because I had a baby in the middle of the conference.

LCA used to be my main way of reconnecting with open source while I was working on my PhD. But now I work for the Ada Initiative and open source (and open stuff) events are a big part of my job. While I have more time and energy for conferences I am attending them for very different reasons now and the lure of the new is getting strong.

Because my volunteer time is diminishing, LCA 2013 is definitely the last LCA in which I will have had significant input into the program (Michael Davies and I are co-chairs of the conference program this year, as we were for 2010). So, it’s something of a farewell tour for me and I’m looking forward to the program we worked so hard putting together.

Monday

… actually my non-LCA-ing family is still in town Monday, so I’ll probably go to Bdale Garbee’s keynote and then hang out with them. Off to a great start here, I know.

Tuesday

Radia Perlman’s keynote is the keynote I am most looking forward to this year. Following that several of my peeps are giving Haecksen talks before lunch:

  • Feminism, anarchism and FOSS – Skye Croeser
  • Overcoming imposter syndrome – Denise Paolucci
  • Security – Joh Pirie-Clarke

People may be especially interested in the Imposter Syndrome talk, Imposter Syndrome being the feeling that you’ve achieved your current position or status totally fraudulently and are going to be discovered any second and publicly humiliated. It’s very common among people who are in quite critical fields (like academia). Denise was among our Imposter Syndrome facilitators for AdaCamp DC.

I am not sure after lunch, but Web Animations: unifying CSS Transitions, CSS Animations, and SVG (Shane Stephens) is a definite contender. In the afternoon The Horrible History of Web Development (Daniel Nadasi) sounds interesting (although it’s the kind of talk where an abstract would be really useful in determining whether I want to go) but so do What we can learn from Erlang (Tim McNamara) and Concurrent Programming is not so difficult (Daniel Bryan)

Wednesday
Trinity: A Linux kernel fuzz tester (and then some) (Dave Jones) is very tempting in the first slot, but I think I will go to Think, Create & Critique Design (Andy Fitzsimon) because I want to “speak” design semiotics a little bit better and have for a long time. Talking to graphic designers is actually part of my job.

In the second slot I am not entirely sure, but probably Open Source and Open Data for Humanitarian Response with OpenStreetMap (Kate Chapman) since I periodically dabble in OpenStreetMap.

After lunch my pick is definitely Free and open source software and activism (Sky Croeser). I’ve been following Sky’s activism and research since the EFA lamb roast fun and met her at AdaCamp Melbourne. I want to hear what she has to say about (h)ac(k)tavism.

Not as sure about the following slot (in a moment of mischief, we put the DSD’s talk right after Sky’s, but I’m not especially interested) but the biggest contender is The future of non-volatile memory (Matthew Wilcox) because he usually is one of the highlights of the LCA lower-level technical talks.

The first slot after afternoon tea I am not committing, but it does contain Pia’s grand scheme Geeks rule over kings – the Distributed Democracy. After that I think Copyright’s Dark Clouds: Optus v NRL (Ben Powell) is required: it isn’t LCA without emerging feeling distinctly gloomy about the current state of the intellectual property framework.

Proving Australian citizenship when parents are born after 1986

Update January 2018: the Department of Home Affairs provides a Evidence of Australian Citizenship document to those whose claim to Australian citizenship is not one of the more obvious routes.

I know people who’ve had trouble with this, and won’t tell their specific stories, but a note now that I am applying for a passport for my son.

Australian citizenship is no longer by right of birth: if you were born on or after 20 August 1986, there are various more complicated ways it is acquired. One is being born in Australia and having at least one parent who is an Australian citizen or permanent resident at the time of your birth. Therefore, my son’s passport application has this section:

Please mark which of these documents you will provide at interview (you must present the original):
to prove that the child is an Australian citizen or to prove that one of the child’s parents was either an Australian citizen or a permanent resident of Australia at the time of the child’s birth.

  • The child’s Australian passport issued on/after 01/01/2000 and valid at least two years
  • One parent’s full Australian birth certificate (parent born prior to 20/08/86)
  • One parent’s Australian passport (issued on/after 20/08/86, valid two years)
  • One parent’s Australian citizenship certificate
  • One parent’s Australian permanent resident status

Both my husband and I were born in Australia before 1986, and in addition we happened to hold Australian passports before our son was born in any case, so we have a surfeit of evidence that will satisfy them that he’s a citizen.

But people born in, say, September 1986 are turning 27 this year, and therefore there’s lots of parents and soon-to-be parents whose Australian birth certificates are not sufficient proof of their own Australian citizenship, let alone their child’s status. So it’s very easy to imagine a situation existing right now where someone will need to show up with a grandparent’s birth certificate in addition to a parent’s and their own, and so on. By 2040 or earlier a great-grandparent may be required if no one in the family has held passports in intervening generations (great-grandparent born before August 1986, has a child at age 18+ in 2004, a grandchild in 2022 and a great-grandchild in 2040).

The easiest way around this for citizen-parents seems to be making sure one holds an Australian passport — because it stands on its own, unlike post-1986 birth certificates — before a child’s birth, which is not really foremost in one’s mind at the time. Oh, and be sure to keep it in a safe place until you or any of your children need evidence of their citizenship (usually but not always when they first need a passport themselves), because without it they’ll be back in the same documentation pickle. An increasing number of Australian-born people are going to have to go through the prior process of assembling potentially burdensome proof of citizenship involving either a string of ancestral birth certificates*, or a bunch of evidence of Australian residence on their 10th birthday (see Table B in documenting citizenship). I’d have trouble now proving my own Australian residence at age 10, frankly.

Australia is far from alone in not awarding citizenship by right of birth alone, so I assume there’s either a lot of people around the world who struggle to get passports, or other countries have processes that are more mature and less reliant on finding an ancestor who was unconditionally a citizen.

* Or worse, trying to get hold of 40 year old evidence of someone’s permanent residency, which I suspect is not as available for purchase later on as birth certificates are.