Discussion: Phonology archives

From Joe Pater

I came across an interesting blog post the other day discussing the practice of posting conference papers to arXiv in NLP and machine learning before they have been reviewed. It includes some data from a poll on how people use it in each discipline – machine learning people tend to post earlier in the publication cycle, perhaps due to an influential call for a new publishing model by Yann Le Cun of deep learning fame, and perhaps due to a greater fear of being scooped.

This got me thinking again about archives in our discipline. I came of academic age at the time that ROA was launched, and it was fantastic as a grad student to have access to the latest research in the framework I was using, and to be able to share my own work so easily. As I’ve told Alan Prince already, we’re hugely in his debt for having established that archive, and we also owe a huge thanks to Eric Baković and others for all their work on it, as we do to Michal Starke and others at LingBuzz.

It’s clear, though, that in contrast with the situation in computer science, use of archives is on the decline in phonology. I post to them only sporadically myself, generally only making time to keep my own web page updated. In contrast to when ROA was founded, the preservation function of an archive is less required; most of us have archives serving this purpose at our own institutions (see e.g. John McCarthy’s ScholarWorks archive), and who knows, maybe a document hosted on a google drive will last longer than one on a university site. I find the google drive alternative particularly convenient because it’s so easy to update a paper. And this brings up the main issue in my mind for posting to archives early in the publication cycle: if you have your paper in multiple places, you need to update multiple copies, each with considerably more hassle than a google drive.

Preservation is only one function of these archives, and it’s far less important than another: dissemination. For dissemination, one’s own webpage, or institutional archive, is not a viable alternative. The main impetus for Phonolist was to facilitate dissemination for papers that weren’t being posted to the archives, and it seemed that the added functionality of optional blog discussion of papers would make it attractive for that purpose. I’ve been somewhat surprised to see that people haven’t been using it much for that (most of the papers we advertise are reposts from LingBuzz and ROA).

Phonolist currently lacks any indexing functionality (besides searches), and this is one way that it could be improved to better serve the cause of dissemination. This will likely be an upcoming addition, along with a community .bibtex file.

The question I’d like to bring up for discussion is whether people perceive the need for a general phonology archive, and if so, what it should look like. ROA is limited to OT and its affiliates, and LingBuzz has technical issues that have made it frustrating to use, and I’ve heard that it’s unlikely to be improved. My limited experience with academia.edu and researchgate has been negative. I thought an easy fix might be to start using http://cogprints.org, but in response to my inquiry about it, Stevan Harnad said “CogPrints has no long-term support and I would say it’s obsolete (though I’m still keeping it up).” More generally, I’d be interested to hear people’s thoughts about how they use the existing archives, and why they don’t use them.


33 Comments on “Discussion: Phonology archives

  1. Institutional repositories are the natural and optimal deposit site for research output (most of which is institutional anyway). The rest is just a matter of exporting, importing and harvesting (which will eventually all be done automatically).

    Institutions need to mandate deposit (immediately upon acceptance for publication).

    Vincent-Lamarre, Philippe, Boivin, Jade, Gargouri, Yassine, Larivière, Vincent and Harnad, Stevan (2016) Estimating Open Access Mandate Effectiveness: The MELIBEA Score. Journal of the Association for Information Science and Technology (JASIST) 67 (in press) http://eprints.soton.ac.uk/370203/

    • Thanks for this Stevan. One remaining question is how best to organize dissemination and discussion, assuming that google, facebook and twitter aren’t doing it all for us now. And one could also imagine a role for a central archive for pre-accepted work, especially for a community like ours that does have a tradition already of citing and discussing unpublished work.

      • Yes, providing open access to the pre-refereeing drafts would be very welcome in most disciplines — although many disciplines have a tradition of not publicizing unrefereed work (and in some areas of clinical medicine, for example, it could even represent a risk to public health!).

        But remember that most disciplines are not yet even providing open access to their refereed, accepted final drafts! That’s why I’m stressing the urgent need for mandates from all institutions and funders (in all disciplines, of course).

        Once all researchers in all disciplines are routinely making their refereed final drafts OA immediately upon acceptance, the rest — everything you can dream of: discoverability, preservation, central harvesting, preprint posting, CC-BY and fair-gold OA — will all come with the territory.

        But until then, I think all these other desiderata are premature…

        • I can now see the importance of the institutional archives to the OA movement (I should have gotten this before). Tending to my ScholarWorks page is certainly now on my own to-do list for the summer, and presumably we could do a push in the Department of Linguistics and our nascent Institute for Cognitive Science to get all of our pages up to date…

          • There’s a big difference between a “policy” (which this is) and a “mandate” (which this is not) — as we’ve found at UC, and as has been found at Harvard and MIT where similar policies are in place. It’s not a mandate in two respects. First, and most importantly, there’s no penalty for faculty who don’t participate. Second, and not insignificantly, faculty can opt out of the policy (= obtain a waiver) for any given article. Having the waiver option is important for getting faculty buy-in for such policies, but the fact that it’s a policy and not a mandate (= no penalty for non-participation) is the real problem. We’re currently busy at UC trying to come up with ways to increase compliance with the policy, and the only ideas rising to the top are those that make it very, very, very easy for faculty to fold it into already-existent workflows.

          • Right – compliance must be an issue in these sorts of things. It seems this could be rolled into our Annual Faculty Report process, which is now on-line, without too much problem (but some faculty can’t even handle the on-line report!). All of this has been very informative – I’ll be looking into what I can do locally.

  2. Joe, I’d be interested in your thoughts on why posting one’s own papers on one’s own website is not an adequate model for dissemination.

    As for myself, I go to ROA mainly for the classics that are posted there, and somewhat alternate between using the library and authors’ websites for getting papers.

    • Because we can’t rely on people to come to our websites to find our work. We all use each other’s websites, but we don’t visit them systematically. I’d like to know when you have new work out Robert, but I don’t always. We can send each other e-mails, but that’s also not systematic, and doesn’t reach as many people as an e-mail list. People are using facebook and twitter for this, but again, I’d love to see all new work in phonology coming out of one place. I’m talking both about unpublished and published work – it’s possible that different considerations apply for each.

  3. Google Scholar has an Updates function that shows you a feed of papers relevant to your research–once you have a profile set up. I was concerned that their algorithm would be too narrow in what it considers relevant, but in my observation, it combs the field pretty widely. I’ve gotten news about papers that people upload to their personal websites (sometimes the same day), and of course journal papers show up as well. In order for one’s paper to show up in the feed, of course, one has to make sure that the page is regularly indexed by Google, which some institutions’ IT departments do not ensure by default. But it was through Google Scholar that I discovered that NYU’s Faculty Digital Archive materials get combed by Google’s bots.

    To speak to the more general question of whether we need a specific archive for phonology, I’d vote for “no”. I think it is more useful to have a general linguistics archive, one that is not tied to any specific field or theory. LingBuzz serves that function already, so perhaps what we need is a good mirror for it.

    • Perhaps using Google Scholar updates and keeping our institutional sites updated is the answer to keeping abreast of one another’s work…

      I think the technical issues with LingBuzz go beyond its just being down from time to time. It’s hard for me to imagine it as a long term solution to the archive niche, if there is such a niche. It looks to me like something that could be done a lot better relatively easily given current technology, and though it’s gotten a lot of use, it’s hardly been universally adopted, especially outside of a particular branch of syntax. But I agree on the breadth point – in fact, I’d like to see phonology and linguistics in the even broader cogsci context, which is why cogprints looked attractive to me. Unfortunately I don’t think there’d be much interest in an archive that distributed pre-review stuff in say cognitive psychology, and I’m beginning to see that the post-review part of an archive may best be done institution by institution.

      • I’ve just realized that there is a role for distributing unpublished work in cog psych – this could help to diminish the role of publication bias (see relatedly the recent New Yorker piece on Bialystok’s work).

      • @Joe: Like Robert, I too was wondering why you didn’t think that authors posting papers to their own web sites made for adequate dissemination, so thanks for the clarification (above). I think the difference in perspective is because I tend to follow work by content (keyword) more than by author, so, doing various types of searches — which generally seem to find work on author pages pretty well — has typically worked for me.

        @Maria: Which means that your tip about setting up Google Scholar for updates looks extremely useful. I will try this out right away.

  4. Again a few thoughts:

    1. Not all universities are yet to have institutional repositories. My university, Keio, most likely does not, and it is one of the “best” private universities in Japan. I am curious to know what the availability situation is outside of the US schools.

    2. Concerning Joe’s proposal, I would vote for “yes”. In Lingbuzz, phonology papers tend to be buried in syntax and semantics papers, and plus, very “syntax-y” papers can be tagged “phonological”, if it deals with, for example, Spell out. And as somebody who does more “phonetic-y” work, Lingbuzz is definitely NOT the place to distribute such papers.

    • Is there a place to distribute more phonetic-y work? Does the labphon community have any sort of e-mail list or website for dissemination, beyond the conference and the journal?

      • I know that they have a Google+ page, but it is not very active…I am actually very curious if anybody knows other venues for the labphon community.

        • There’s a significant, but not huge, representation of that community on our mailing list. I for one, would be very happy to get more labphon stuff on here.

    • Shigeto – there is apparently a big Institutional Repository movement in Japan. You should look into it.

  5. I wonder whether Robert’s question and Maria’s first comment are related — perhaps the younger, more tech-savvy generation is totally fine with posting papers to their personal websites precisely because they’ve figured out (semi-)automated ways to find out about such posted work.

    As for Maria’s second comment, I’m (honestly, earnestly) curious how “linguistics” is obviously the right level at which to pitch a repository. If it’s not to be “tied to a specific field or theory” — isn’t “linguistics” a field? (And isn’t “generative linguistics”, which LingBuzz purports to aim at, a theory?) It seems to me that there are many communities of researchers that a repository could serve; “generative linguists” is one, “linguists” is another, “phonologists” is another, “optimality theorists” is another, “all comers” yet another, etc. — I see no particulary reason (beyond personal preferences) to privilege any one over any other.

    • Self-categorization is tricky business. Years ago, I had a conversation with another linguist (syntactician, if memory serves) who contrasted OT with “generative phonology”. My attempts to pin down what “generative” meant led nowhere. On the opposite side of the border, I once suggested a paper to a student, thinking it was directly relevant to the topic of the student’s research, and the reply was “but it’s not OT”.

      I would like to see less balkanization in linguistics and specifically in phonology, and the best way to serve this would be to have a maximally inclusive place for posting papers. LingBuzz is at the moment the Facebook of linguistics–we do not need a Google Plus. We cannot make syntacticians come to phonology talks, but they cannot avoid seeing the papers in the general feed on LingBuzz.

      Joe, I do not think there is an online archive for phonetics papers. My impression is that there isn’t as much of a tradition for distributing papers this way in the more lab-driven fields because they have historically relied on journal publications for dissemination. Many such researchers wouldn’t post their PDFs on their websites or lab pages, either–whether it is because they overinterpret copyright transfers or for some other reason, I do not know. Even now, talking to some of my more lab-based colleagues, it’s apparent that LingBuzz is a baffling phenomenon to them. One can only hope that the push for Open Access will change the academic culture to a point where *not* posting one’s paper and supplementary materials will be baffling.

      • It seems like we’re in a good position to influence our colleagues in neighboring disciplines where they don’t even have the tradition of putting everything on their own websites. Clearly some cognitive psychologists do this, but I do get the sense that there may be greater reluctance. One piece of data that you can throw their way: the policy at Cognitive Science is that authors do have the right to post a copy on their website. Elliott Moreton also got them to say that we could post to one archive, which led to us trying to decide between LingBuzz and ROA. I think we picked ROA, but then got too busy with other things to follow through with the posting.

      • The “OT is not generative” line was one that was being aggressively promulgated around the turn of the century, particularly by John Frampton at Northeastern (who was co-editor of TRL at the time, so that was helpful). The “but it’s not OT” thing is not atypical of students finding their feet in the field; replace “OT” in that phrase with virtually anything and I think you’ll summarize a conversation being had among students at a bar somewhere in the world right now. 🙂

        But seriously: I also favor less balkanization in linguistics; I’m just not convinced that a more inclusive repository will lead to this desirable outcome, nor that a less inclusive repository will (necessarily) lead to further balkanization. We have very inclusive scholarly societies, conferences, and journals, and yet the balkanization is there. Why should an inclusive repository make a difference?

        Let me be clear about why I’m asking this, lest it seem like I’m just being ornery. My own ideal outcome in all of this would be to have a new and better system for scholarly communication, one that encompasses the whole field of linguistics and that serves as a model for other whole fields. But I’ve become convinced that such an outcome can only be attained if several different models emerge from different subfields, such that the best ideas that serve the specific needs of each of those subfields can be identified and implemented in an eventual, more inclusive system.

        • In the spirit of letting the best ideas rise to the top, we should probably be looking at the semantics archive – my semantics colleagues tell me it’s great (I’m looking at it for the first time now, and notice that it bridges linguistics and philosophy, which is interesting):


      • I started working with “more engineering-minded” phoneticians, and they are really against making non-final drafts available to other people. One of them told me that that is the norm in engineering, because paper contents are “easy to steal”. If we make our paper available with the detail report of methodology, some other lab can just do the same experiment and publish it before us. That does not explain why they don’t make pre-publication version publicly available after acceptance.

        • Right – the same worry about being scooped also holds in psycholinguistics. It’s funny – the reverse holds in machine learning, and in linguistics – get the paper out there, even unpublished, so you aren’t scooped! I guess it’s because we cite unpublished work.

  6. Thanks, Joe, for initiating this discussion — and also to you and Gaja for Phonolist, which is a great resource. I’m particularly glad to see that you’re thinking of further ways in which Phonolist can possibly be useful to this community. (May you succeed where I failed with phonoloblog!)

    For what it’s worth (and I communicated some of this privately with Joe and Gaja before this post went up): it’s been my experience as both a user and an administrator of ROA (for ~20 years) that archive/repository usage varies according to factors that have less to do with the Inherent Goodness of Scholarly Communication than one might think. Needless to say, I hope, this is all just my perspective on the past — that perspective may be at variance with the actual past, and/or with the potential futurewe.

    In the beginning (1993), ROA was the only game in linguistics-town, at a time when the theory that it was designed to promulgate was taking hold in the community that it was designed to serve. The vast majority of us thus posted there, and looked there to see what was happening. But within just a few years (after I began as adminstrator, in 1996), I think that three interdependent forces began to discourage broad participation, especially among younger scholars.

    The first force was an older guard of scholars, who chided us for primarily citing unpublished work. Nevermind the fact that many of these same scholars had long shared unpublished manuscripts amongst themselves (in smaller circles, by necessity), and that many of them had assumed their (deserved) stature with conference proceedings papers and other book chapters that were largely unreviewed — it was still enough to make some folks think twice about posting to ROA, if not to think twice about citing ROA work.

    The second force involved scare tactics from publishers. People started to take down their ROA posts, or not posting to ROA in the first place, for fear of not being published. In most cases the fear was actually unfounded, but fear is fear, unfounded or not: even if Journal X does not forbid online dissemination of a (non-final) version of a paper, the fact that I’ve heard that Journal Y does forbid this makes me worry about whether I can publish my paper with Journal X if I’ve posted a version of that paper on ROA.

    The third force was something that I never expected, and that I am still surprised by when I hear it: there are younger scholars who assume that ROA is only a resource for finding work posted by others, not a resource for posting their own work. There are several variants of this; the two I’m most familiar with are (1) “only polished work makes it on ROA” and (2) “only work from established scholars makes it on ROA”.

    More recently, further disengagement with ROA has been the result of a variety of other factors. The most prominent among these have been the following two:

    (1) What counts as OT. The first line on the front page of ROA changed in late 2012 from “a distribution point for research in Optimality Theory” to “… for research in Optimality Theory and its conceptual affiliates”, but apparently too late for those “conceptual affiliates” to actually feel affiliated.

    (2) Hacking/updating. ROA was hacked in 2011, which led to a long-overdue update. The downtime was significant, and many folks who might have been inclined to continue posting there turned elsewhere (like LingBuzz, which also stopped cross-posting from ROA at that time).

    In short, it’s a fickle world out there, and I think there are many obstacles to getting a (sub)field-specific repository to function well for the (sub)field. Unless there’s some stable functionality and some serious value-added, I’m afraid that any attempt to start a new repository will suffer the same fickle-ishness that ROA has.

  7. Thanks Eric. All of this is getting me back to the view that setting up a shared bibtex file that is searchable online may be enough for dissemination. The idea is that since we have alternative places to physically archive material, we just need ways to index it and find it. And even if this turns out to not supply any added functionality beyond what google et al. provide, the shared bibtex file would have independent usefulness in reference managers.

    One note on your recent factors: As one person working on conceptual affiliates of OT, I can say that my decisions to post or not to ROA were never influenced by whether the work was OT, HG, or HS or whatever. At first I think it was that I shared Robert’s view that people would see it on my webpage, and when I realized that wasn’t true, it’s come down more to simply not making the time, especially now that I can do some of the dissemination here.

    • The “making the time” problem is unfortunately one that applies across the board, from participation in repositories to compliance with Open Access policies. And I’m not claiming to be an innocent here, either.

      If the general conclusion to this thread is what you say here (searchable shared bibtex file, or some such), it will be good to get some librarians involved in the implementation (if not earlier in the discussion itself). They know how to do all this, and to do it well.

  8. Thanks for the advice – I’m actually going to talk to the open access people tomorrow at the library, and will ask them who to consult with about this sort of thing too. I was having the same thought about archives – why should we build these things from scratch when they are the experts?

    The bibtex project is something I want to do in any case. I’m currently learning towards not doing any archive thing, but if there’s something easy and seemingly useful that the library people suggest, then I might be swayed in the other direction (especially if future visitors here express more enthusiasm for a new phonology archive).

    On the balkanization sub-thread – I have a feeling, unfortunately, that lingbuzz and ROA lie on opposite sides of one fault line in our field. If we did set something up, it’d be ideal if it could be done in some way that straddled those communities, or was more inclusive in some other direction (e.g. phonetic-y stuff).

  9. > it’d be ideal if it could be done in some way that straddled those communities, or was more inclusive in some other direction (e.g. phonetic-y stuff).

    I can’t agree more about this statement, especially as somebody who cares about both phonetics and phonology.

  10. I had a very productive meeting with the Open Access / Open Education people in my library yesterday. I’ll be working on getting my ScholarWorks oage up-to-date this summer, and hopefully those of everyone else in the department, and maybe we can get some of the broader Cognitive Science community on-board too.

    I’m also going to start working with them in establishing an “Open Working Papers”. This could serve as the back end for a vast range of future initiatives in archiving / open publishing. Papers will get a DOI, and be permanently archived. Authors will be able to upload and update papers themselves. Implementation should be trivial, and they see no problem with hosting thousands of papers.

  11. As if on cue, David Pesetsky just posted to Facebook about an outage on LingBuzz, and asked whether they should start plotting a revolution again…

Leave a Reply to Joe Pater Cancel reply

Your email address will not be published. Required fields are marked *