My new branching validation seems to be working with the test cases I’ve given it. Everything seems to be working properly, so this week I started working on refactoring the hierarchy of the Python folder and organizing my files. After fiddling around with __init__.py files, I created a new SurveyMan package in the python folder with subpackages survey, examples, and test. All of the files previously contained in the survey directory in the python folder are now in one of the subpackages. The survey package contains the main components for building the survey (the survey_representation and survey_exception modules).
Emma finished writing the JSON validator and a new schema, so I will be testing my current JSON against that this week. She suggested I alter the SurveyMan __init__.py file to include code to copy the contents of the resources folder for testing against the schema. I’m working on writing a script to call the jsonize() method of some of my (correct) survey programs and validate the output against the new schema. Once we figure out what to do about the JSON, assuming no other alterations are needed, I assume I’ll start to think about how to make the SurveyMan package installable.
I added a few updates and refactorings to my design document, as well as a better description of what motivated the design and why the package is necessary. I’ll try to put up a copy later.
This week, I got a lot of miscellaneous things done. I was stuck on some other stuff for a while, so I added docstrings to each object and function. Emma and Emery decided that for a final paper, I should be turning in some sort of 3 page design document, so I began working on that as well. Right now, I just have a description of each of the objects, as well as a class diagram for each with some arrows that are supposed to show which classes contain instances of the others. I’m not sure whether I need to include my SurveyExceptions class or my idGenerator class; I haven’t added anything for them yet. In addition, I’m not sure how high level or low level to make my descriptions yet; I can’t really add much more detail without making it a lot longer than 3 pages.
Diagram from my draft design document
Regarding the confusion with the branch-one, branch-all, branch-none checks last week, Emma directed me to her blog post regarding the branching paradigms for blocks and subblocks. I considered moving the branch paradigm checks out of the Block object, but in the end, I kept it there and rewrote the check so that it recursively iterates down to the lowest level of blocks contained in a Block, determines how many branch questions they have, and returns “branch-one”, “branch-all”, “branch-none”, or “bad-branch”. The branch paradigms of each blocks’ subblocks are then checked against how many branch questions the block contains at each level, to ensure that the proper combinations of “branch-one”, “branch-all”, etc. are maintained. This seems to work based on the few tests I’ve given it, but I need to test it a bit further just to be sure. It’s probably not the most elegant solution, but given the design I have right now I can’t think of a less complicated one.
Assuming my branch validation checks are correct, the only issue left to address is the JSON. Emma is working on the parser, so hopefully we can figure out what the problems are soon, and then my modules should (hopefully) be completely functional.
During the past few days, I accomplished a lot in terms of testing/debugging my branching validation checks. I created a Constraint test suite which I am using to test that surveys with broken branching throw the appropriate exceptions when their jsonize() functions are called. I created some additional example surveys which branch backward and do not follow the branch-one, branch-all, or branch-none policy. Testing with these surveys led me to discover a ton of bugs/typos within my validation check functions, as well as in other functions. I haven’t completely finished testing the validation methods, but what I’ve tested so far (backwards branching and invalid branch numbers) seems to work, at least for top level blocks. However, I just realized that my check for number of branch questions per block doesn’t check subblocks. I’m not sure how this should be handled for questions contained in subblocks (for example, if block 1 contains a branch question and block 1.1 also contains a branch question, does this count as 2 branch questions in block 1?) Depending on how to interpret this, I may need to rethink where/how I check the number of branch questions in each block. Right now, I have a Block method which iterates over the questions contained within that block and checks if they are branch questions, which is called for each block in the survey from the Survey object. I’m not sure if this is the best way to set things up.
Additionally, I removed all of the functions deemed unnecessary at the last meeting, which makes the module a lot shorter and cleaner looking. Once Emma finishes the JSON parser, I can hopefully determine whether my JSON is valid or how I need to change it; it still isn’t working with the HTML tests, even though it looks valid compared to the schema.
I’ve run into some design flaws regarding how components are identified and removed from the survey. Emma and I agreed that the current component ids should remain hidden from the user and be used only for internal representation, but the issue of how to identify components when trying to remove or find them in the survey remains. Over the weekend, I changed all of the removal methods for each component so that they took an object pointer as an argument, rather than the id of the component being removed, under the assumption that this would be more intuitive than keeping track of arbitrary component ids. However, this poses an issue if the user only has a pointer to the top level survey object (for example, in my test suites, the example surveys are created externally and returned as a single survey object; the test module has no pointers to their inner components). This method also poses an issue if, for example, components are created and added to the survey using list comprehensions, and the user does not retain a pointer to each individual component.
After discussing the above issues with Emery and Emma, we have determined that the functionality for removing questions is unnecessary, and can be removed. My initial assumption of the library’s usage was that a user may want to create several versions of a survey, in which certain questions, blocks, or other components may be added, removed, or changed. We decided that this is no longer the case, and that the library does not need to allow for the alteration of surveys. It exists for the user to program a single survey and obtain the corresponding JSON representation of that survey; any different surveys can be programmed separately, and there is no need to be able to load and alter an existing survey. Given these assumptions, I will focus my testing on everything else and ignore the removal methods (I’ll probably delete them eventually, as long as I’m sure they’re no longer necessary).
In the last couple of days, I started working on testing my blocking implementation more thoroughly by writing a testing module called BlockTests. I decided to just use the unittest module rather than py.test (which Presley suggested), since it seemed easy enough to use and is already built in. My test module creates an instance of each of my current sample surveys (I only have 3 right now, one of which is just a dummy survey with no meaningful content), then runs a unit test called TestBlockNumber which recursively counts the number of blocks and subblocks in the surveys, asserting that there are as many as there should be. To aid with this, I created a new function in the Block class to return a list of the block’s subblocks, if it contains any, and corrected some small errors I noticed in the class.
My next objective was to test that the addition and removal of blocks using each of the available functions worked properly, but I am currently rethinking some of these functions and the structure of the survey. Currently, the user can remove a block either by its index in the survey (if it is a top level block) or by its block id. Since the block ids generated by my representation are hard to keep track of and don’t really convey the survey structure at all, I thought it would be easier to just pass the reference to the block object as an argument, rather than the block’s id. Alternatively, Emma suggested that a block could have a reference to its parent (in the case of a top-level block, the survey, or for a subblock, its parent block), and that a remove function could be defined in the block itself to remove itself from its parent’s block or content list. This seems like the best course of action in terms of simplicity for the user, but I’m going to need to rewrite /rethink the way I currently have things set up.
Once I refactor and verify that my blocking is working correctly, (as well as the other components of my survey) I will probably start addressing branching validation more thoroughly. In addition, in previous meetings, we discussed implementing exchangeable blocks in such a way that the underlying JSON representation is not altered (e.g. no new columns are needed). However, this is not a priority right now, at least not until I verify that everything else is working. Another task that would be useful to address at some point is to create doc strings for each of the objects/methods, but obviously having the survey representation implemented and tested is more pressing right now.
I haven’t posted a blog entry in a while, due mostly to spring break and to excessive amounts of midterms; after this Monday, hopefully things will die down and I’ll be able to focus more on the current issues rather than 3 exams.
Before break, I met with Presley to do a review of the survey representation code currently up on Github. We discussed with Emma the issue of whether the library should output JSON or a CSV, and determined that spitting out the JSON directly was bypassing a large portion of the SurveyMan Java backend (not sure of the terminology), and that it might be better to generate some different intermediate form to be passed to the Java. I’m just leaving it as is for now until we figure out something better. Emma created an issue before I left for break regarding calling the java program directly from the python, but we need to fix the issues I’ve been having with SurveyMan and Windows before I can do this.
Presley also suggested that I create an untested branch to push to, but Emma said it was probably unnecessary, since only tested/working changes would be pulled into the main project from my fork anyway. I had been holding off on pushing anything until I figured out how to create an untested branch, but after discussing it with Emma I just committed and pushed my changes to the master branch.
My most recent commits include adding an exceptions class which includes exceptions for bad branching and referencing questions/options/blocks that don’t exist. I implemented a first pass at a validate method for the survey that checks if the survey has all the blocks referenced in the constraints; I haven’t implemented checks for backwards branching yet. I made a few changes in places where I had originally been printing out error messages, throwing exceptions instead. I hope to get started next week on the test module to determine that all of this actually works correctly.
I also created another sample survey based off of a survey that I found in data/samples. This was meant to demonstrate subblocks, since my original survey did not make use of them. However, I haven’t really done anything with it besides just print out the survey structure and eyeball it to make sure it looks right. Again, I should probably create more samples with different properties.
Last week, I did a bit of planning regarding how best to implement the test module. I had originally been working with unittest.py, but during our meeting on Friday, Presley mentioned another module called py.test which she said worked well and was simple to use. I’ll have to look into it and see which one is better for our needs. In terms of testing, I realized that I need more sample surveys that make better use of all the features (i.e. subblocks and more intricate branching). The one I have now doesn’t have any subblocking. In addition, I thought it would be a good idea to create a module full of Survey Exceptions which would be thrown if the user tries to create an invalid survey with invalid blocking and branching. Emma suggested that I add a validation check at some point before the JSON is created to make sure that all of the survey components are valid; I have determined that the best place to put this check is in the top level Survey object, in the form of a function called validate() which is called in the JSON method. This should check for invalid blocking or branching and throw appropriate exceptions if there are issues; my test module should test that these exceptions are thrown for deliberately invalid surveys. I thought about creating checks that would throw exceptions as soon an invalid branch is created, but in order to determine whether branches are invalid, access to the entire list of blocks is needed; since this is stored in the Survey object, it is easiest to check these things at the Survey level rather than at the Question, Block, or Constraint level. This is all still in progress, and I hope to get more done on it this week. I am meeting with Presley tomorrow to go over the current Python code and discuss what needs to be done/how best to do it.
Emma and I also talked about whether it would be better to have the Python output a CSV representation of the survey rather than just spit out JSON, since the CSV is more like the “bytecode” of the survey than the JSON is (in that the JSON is just for transmitting the information, and can’t really be written by hand). CSVs are also easier to validate than the JSON. If I am to add this functionality, I’ll probably just add some methods similar to the jsonize methods to produce a CSV (I won’t get rid of the jsonize methods unless it’s clear that we don’t need them).
Since last week, I got the jsonized survey (with blocking, not with branching) to pass the HTML tests, and the resulting survey HTML appears to be working correctly. I then made an attempt at adding branching, but am still working on getting it to produce the correct JSON for the branchmap. My original thoughts, based on a discussion between Emma and Presley, were to make a Constraint object, which takes a question as a parameter and allows the user to specify branches from the question’s options (based on their id or index). The survey would then contain a list of constraints, along with the top level block list. However, the JSON schema structure suggests that the branch map (Constraint) JSON is meant to be generated within the question, as a question property (rather than a top-level survey property). I am dealing with this by creating a question attribute to reference the constraint when a constraint is constructed for that question; questions with no constraints won’t have this attribute, since it is created and assigned within the constraint constructor. When producing the Question JSON, I check whether it has a constraint attribute, and add the appropriate BranchMap JSON if it does.
The current code on my GitHub repo includes an attempt a branching survey representation, but I’m not sure that the branching JSON is correct. It passes my validators, but it is no longer passing the HTML tests. I will have to try to fix this.
In terms of milestones, I accomplished a portion of what I wanted to get done for the end of this week, but I got less of the testing stuff done that I would have liked; I haven’t really started on any automated tests yet. Until I get more testing material, I figured that it would make more sense to focus on the branching. Unfortunately, I won’t be able to meet with Emma and Presley today because of travel issues for track (which came up unexpectedly), so hopefully I can get back on track when I get home on Sunday and be a bit more productive next week; I’ve been struggling with other classes’ homework.
At this point, Emma and I have established a few milestones for me to accomplish over the next few weeks. The first priority this week was to create a sample survey based on a short MTurk questionnaire (https://github.com/etosch/SurveyMan/blob/master/data/Ipierotis.csv), generate a JSON file from the survey object, and check that the outputted JSON agrees with the schema. My example survey does not have any branching yet; I will start implemented that as soon as I determine that everything else functions properly. While creating the survey, I ran into a few bugs in my survey objects, mainly that adding options to one question would somehow add then to all subsequent questions. Although I fixed this by making an option list a required argument when creating a question, I’m not entirely sure why it was happening. I had some difficulty validating the outputted JSON against the schema, due to a few syntax errors in the schema which I have since resolved (and the fact that I was pulling from the wrong branch). At this point, it appears that the outputted JSON from the example survey is valid, and the next step is to make sure it produces the correct HTML and that everything looks and functions properly. Emma sent me a few HTML tests, and I’ve been copying in the JSON for my example survey to try to verify that it works with the HTML. So far, I haven’t been able to make it work.
For next Friday, the goal is to have the branching implemented, as well as some tests for blocking and branching; basically, I should have a first pass at a Python survey library done. I’m not sure how much I’ll get done over the weekend, since I’ll be away, but hopefully if I can look at the HTML stuff today, I’ll be in good shape to get the required things done for next week.