Question Sets and All Paths

I was nerd-sniped with a question at work the other day. The set up was like this, imagine a survey where all of the questions have yes-no answers. Some of the answers are terminating, so if you answer No to a particular question, the questions stop. But some if you reach that part of the tree will keep going.

The ask was a formula to articulate the total number of potential unique answer sets. Which I was not able to figure out, I did however write python code to generate all of the unique sets. So here is that function, where I create a networkx directed graph in a particular format, and then find all of the potential start to end paths in that graph. You just add a dummy begin node, and end nodes at the terminating locations and the final question. You then just search for all of the paths from the begin to end nodes.

import networkx as nx

def question_paths(totq,term):
    '''
    totq -- positive integer, total number of questions
    term -- list of integers, where ints are questions that terminate
    '''
    nodes = [f'Q{i}{r}' for i in range(totq) for r in ['N','Y']]
    edges = []
    for i in range(totq-1):
        edges.append([f'Q{i}Y',f'Q{i+1}N'])
        edges.append([f'Q{i}Y',f'Q{i+1}Y'])
        if i not in term:
            edges.append([f'Q{i}N',f'Q{i+1}N'])
            edges.append([f'Q{i}N',f'Q{i+1}Y'])
    # adding in begin/end
    nodes += ['Begin','End']
    edges += [['Begin','Q0N'],['Begin','Q0Y']]
    for t in term:
        edges.append([f'Q{t}N','End'])
    edges += [[f'Q{totq-1}N','End'],[f'Q{totq-1}Y','End']]
    # Now making graph
    G = nx.DiGraph()
    G.add_nodes_from(nodes)
    G.add_edges_from(edges)
    # Getting all paths
    paths = []
    for p in nx.all_simple_paths(G,source='Begin',target='End'):
        nicer = [v[-1] for v in p[1:-1]]
        paths.append(nicer)
    return paths

And now we can check out where all paths are terminating, you only get further up the tree if you answer all yes for prior questions.

>>> question_paths(3,[0,1,2])
[['N'],
 ['Y', 'N'],
 ['Y', 'Y', 'N'],
 ['Y', 'Y', 'Y']]

So in that scenario we have four potential answer sets. For all binary, we have the usual 2^3 number of paths:

>>> question_paths(3,[])
[['N', 'N', 'N'],
 ['N', 'N', 'Y'],
 ['N', 'Y', 'N'],
 ['N', 'Y', 'Y'],
 ['Y', 'N', 'N'],
 ['Y', 'N', 'Y'],
 ['Y', 'Y', 'N'],
 ['Y', 'Y', 'Y']]

And then you can do a mixture, here just question 2 terminates, but 1/3 are binary:

>>> question_paths(3,[1])
[['N', 'N'],
 ['N', 'Y', 'N'],
 ['N', 'Y', 'Y'],
 ['Y', 'N'],
 ['Y', 'Y', 'N'],
 ['Y', 'Y', 'Y']]

One of the things doing this exercise taught me is that it matters where the terminating nodes are in the tree. Earlier terminating nodes results in fewer potential paths.

# can different depending on where the terminators are
len(question_paths(10,[1,5])) # 274
len(question_paths(10,[6,8])) # 448
len(question_paths(10,[0,1])) # 258
len(question_paths(10,[8,9])) # 768

So the best in terms of a formula for the total number of paths I could figure out was 2^(non-terminating questions) <= paths <= 2^(questions) (which is not a very good bound!) I was trying to figure out a product formula but was unable (any suggestions let me know!)

This reminds me of a bit of product advice from Jennifer Pahlka, you should have eliminating questions earlier in the form. So instead of filling out 100 questions and at the end being denied for TANF, you ask questions that are the most likely to eliminate people first.

It works the same for total complexity of your application. So asking the terminating questions earlier reduces the potential number of permutations you ultimately have in your data as well. Good for the user and good for the developers.

Leave a comment

2 Comments

  1. Yash Gupta's avatar

    Yash Gupta

     /  November 11, 2024

    I think I have an answer here: https://news.ycombinator.com/item?id=42026048

    Reply

Leave a reply to apwheele Cancel reply