Decision making strategies in the face of conflicting or uncertain sensory input have been successfully described in many different species. Here we analyze large behavioral datasets of larval zebrafish engaged in a ‘coherent dot’ optomotor assay. We find that animal performance is bimodal and can be separated into two ‘states’, an engaged state where performance is high and fish consistently turn into the direction of the coherent motion, and a second, disengaged state, where performance drops to chance. We find that a simple HMM is sufficient to model these transitions and fits our experimental data well. We find that this addition can be incorporated into an existing DDM framework that has previously been used to model perceptual decision making in larval zebrafish. Further, we leverage the large behavioral data sets to fit a mixture model of performance distributions and extract two latent variables which we term ‘focus’ and ‘competence’. Whereas ‘competence’ quantifies performance while the fish is in the engaged state, the ‘focus’ variable captures the relative duration for which each animal persists in the engaged state. We show that ‘focus’ may be largely inherited from the parents, while ‘competence’ is more likely to be influenced by environmental context. This quantitative framework for analyzing decision making can be used to screen genetic perturbations for their impact on these two aspects of performance, and potentially help to identify a genetic basis, and a neural mechanism for attention, that extends across organisms.