First International Conference on Advances in Computer-Human Interaction Spatial auditory interface for an embedded communication device in a car Jaka Sodnik, Saso Tomazic Christina Dicke, Mark Billinghurst University of Ljubljana, Slovenia HIT lab NZ, New Zealand
[email protected] [email protected]Abstract to distraction. When reaching for objects inside the vehi- cle or otherwise shifting out of their normal sitting position, In this paper we evaluate the safety of the driver when drivers can degrade their ability to react to various unex- using an embedded communication device while driving. pected anomalies on the road [3][4]. As a part of our research, four different tasks were pre- With this in mind, the sound channel could be used as formed with the device in order to evaluate the efficiency an alternative option for driver-vehicle interaction. Speech and safety of the drivers under three different conditions: synthesis systems are often used with various navigation de- one visual and two different auditory conditions. In the vi- vices and speech recognition systems with mobile phones in sual condition, various menu items were shown on a small cars. Sometimes they are combined with small screens on LCD screen attached to the dashboard. In the auditory con- the dashboard. ditions, the same menu items were presented with spatial In our study we used two auditory interfaces of different sounds distributed on a virtual ring around the user’s head. complexity to operate an embedded communication device The same custom-made interaction device attached to the while attending to a driving task. We reduced the mechani- steering wheel was used in all three conditions, enabling cally and visually distracting events, so that we could focus simple and safe interaction with the device while driving. on the influence of the secondary tasks of varying complex- The auditory interface proved to be as fast as the visual ity (conducted with an auditory interface) on the primary one, while at the same time enabling a significantly safer driving task. We used spoken menu items to build the au- driving and higher satisfaction of the users. The measured ditory interface, as they have proven to be very effective workload also appeared to be lower when using the audi- [5][6]. We also compared the auditory interface to the clas- tory interfaces. sic visual interface comprising of a small screen. 2. Related work 1. Introduction The auditory menu used in our experiment was based on A car is no longer used merely for traveling and getting a number of spatial sounds placed on a virtual ring around from one place to another, but also more and more as an the user’s head. The items on the ring represented all current office-on-the-go. Nowadays, cars are being equipped with options at the specific level of the hierarchical menu. new powerful computers functioning as navigation systems, The principle of the hierarchical menu navigation in the music players, DVD players, communication devices, etc. use of spatial sound was also used by Crispien et. al. In order to make use of all that functionality, a great amount [7]. They designed an interface aligning both non-speech of user attention is required. A typical interaction with such and speech audio cues in a ring rotating around the user’s a device causes a significant amount of distraction from the head. The items in the ring were manipulated by using 3D- driver’s primary occupation - driving. Distraction is not poiting, hand gestures and speech recognition. only caused by physical stimuli through the sensual appara- Similar spatialised auditory icons localized in the hori- tus, but also by various cognitive sources, such as thought or zontal plane were also used by Brewster [8]. The user se- emotional arousal [1][2]. Distraction from the primary task, lected an arbitrary auditory icon with a hand gesture which i.e. driving the car, can reduce the driver’s safety by de- triggered the corresponding event. grading the vehicle control (speed maintenance, lane keep- The Nomadic Radio was developed as a spatial audio ing, etc.) and object or event detection [3]. Apart from the framework for the wearable audio platform [9]. It included visual (eyes-off-the-road), auditory and cognitive distrac- a system for notification about the current events: incoming tion (mind-off-the-road), mechanical causes can also lead e-mails, messages, calendar entries, etc. The items of the 0-7695-30876-9/08 $25.00 © 2008 IEEE 69 DOI 10.1109/ACHI.2008.38 menu were positioned around the listener’s head in this case the drivers. On the other hand, the time required to finish as well. The input interaction was based on voice command different tasks was expected to be shorter when using visual and tactile feedback. interaction, since the visual communication channel offers The examples given in this section also use spatial sound a much greater bandwidth, therefore more information can for the interaction with various devices. However, so far be perceived at a certain time. no such interface has been tested or evaluated in a mobile environment (e. g. while driving a car or a simulator) and 4. Experiment design compared to a purely visual interface. 4.1. Car simulator 3. User study The experiment took place in a visualization room The main goal of our user study was to evaluate the ef- equipped with a large projection screen (2.4m x 1.8m) and fectiveness of the acoustic interface in the interaction with a 7.1 surround sound system (Creative GigaWorks S750). All communication device in a car. The communication device sounds used in the experiment were played with Creative had the functionality of a mobile phone (it enabled making Sound Blaster X-Fi ExtremeMusic sound card and Creative phone calls and sending text messages) as well as an enter- OpeanAL sound library was used for spatial sound position- tainment system (it also enabled listening to music, watch- ing [10]. OpenAL enables easy positioning of virtual sound ing pictures, etc.). We were interested in the use of such sources in 3D space using the CMSS-3D surround sound a device while driving. Due to security reasons a car sim- technology on the X-Fi Creative sound card [11]. ulator was used instead of a real vehicle. The interaction with the device was based on a special custom made inter- action device attached to the steering wheel in order to be used safely while driving. The car simulator, the device it- self and the interaction device are described in detail in the following chapters. Two different interfaces were compared in the user study, both of which represented the same hierarchical menu struc- ture of the device. In the acoustic interface, all menu items were presented with spatial sounds coming from different pre-fixed positions in the simulator. Other sounds, such as the car engine, environment noise, etc., were non-spatial and were played through all speakers as a background noise. In the visual interface, all items of the menu were shown on a small LCD screen attached to the dashboard of the car. The evaluation of the two interfaces was made by observing the drivers while they were driving and performing different Figure 1. The car simulator consisting of a tasks with the communications device. The main parame- big projection screen, a steering wheel and ters of the evaluation were: a small LCD screen. • efficiency of the individual interface (the time required to finish an indiviual task) CMSS-3D creates eight individual sound channels using • safety of the driving (penalty points were given for un- a multi-channel upmix process. Multiple-speaker config- safe driving) uration (7.1) was used instead of the headphones in order to enable drivers to also perceive the co-occurring auditory • perceived workload (reported by drivers) events (car engine, braking, environment noise, etc.). • overall satisfaction of the test subjects (expressed The speakers in the simulator were positioned according through the modified Questionnaire for User Interface to Dolby recommendations for 7.1 systems. The listener Satisfaction - QUIS) was positioned in the sweet spot in order to ensure accurate sound localization. We expected the acoustic interface to be much safer than The ”Swiss-Stroll” track of the RACER car simulation the visual one, since all interaction was based only on the software version 2.1 [12] was projected on the screen. The acoustic channel. The visual channel could therefore be simulator was controlled with the Logitech MOMO Racing used for driving only, enabling a much lesser distraction of steering wheel and automatic gear changing was applied. 70 The same type of car (Peugeot 307) was used throughout main menu and entered one of the submenus. The central the entire experiment. The experiment was performed in pitch of the melody changed according to the current depth New Zealand and therefore the car was equipped for driving of the user in the submenu. Each time the user moved to on the left hand-side of the road. Although the validity of a lower level of the menu, the pitch was lowered, and vice the car simulator was not preformed we believe a very good versa. The background melody helped the users to be aware approximation of a real driving task was achieved by using of their absolute position in the menu. big screen projection, surround sound and steering wheel with force feedback. 4.4. Interaction device The communication device used in the experiment was operated through a hierarchical multi-level menu. A sim- With both types of interfaces the interaction with the plified version of a NOKIA series 60 mobile phone menu communication device was performed with the help of a was modified in order to have a maximum of six items at custom-made device consisting of a small scrolling wheel each menu level. The reason for this was our assumption and two buttons. that more than six items presented with simultaneous spa- All three parts of the device were attached to the steering tial sounds could not be perceived clearly. wheel in order to be used safely while driving. The scrolling wheel was used to navigate between all available items at a 4.2. Visual interface certain level of the menu. The visual interaction was based on a small LCD screen (12cm x 15 cm) attached to the dashboard where it could be seen easily while driving. Different items of the menu were presented with large white fonts on a black background. The selected item was highlighted with a light green bar. When a specific item was selected, new submenu items were shown or, in the case of moving back in the menu structure, the previous items were loaded again. 4.3. Acoustic interface In the two acoustic interfaces, different items of the menu were presented with spatial sounds and played to the driver through the speakers in the simulator. Spatial sounds were placed on a virtual ring around the driver’s head. Each Figure 2. The interaction device consisting of individual item was therefore represented with the sound a scrolling wheel and two buttons (left and at a certain position. The driver could navigate the menu right). by rotating the virtual ring with the sounds in any direc- tion (i.e. left or right). The sound source located directly in front of the user represented the selected item (equiva- When used with the visual interface, the scrolling wheel lent to the highlighted row in a visual menu). All the sound would move the selection bar up and down in the menu. In sources in the ring were always distributed equally in or- the case of the acoustic menu, the wheel would turn the vir- der to achieve the maximum possible spatial angle between tual ring with the sound sources in one of the two possible them. For example, if there were three items in the current directions (i.e. left or right). In this case, the angle of each menu, the spatial angle between the individual items was individual turn was always the angle between two neighbor- 1200; if there were 6 items in the menu, the angle was 600, ing items in the acoustic menu, so that one item was always etc. The listener or the driver was positioned slightly to the selected. front of the centre of the ring (closer to the front items). Due The two buttons were used to either confirm the selection to this fact, the central front source, the one representing the or move back (i.e. upwards) within the hierarchy. selected menu item, was perceived as the loudest one. The sound sources were spoken words - the menu items 4.5. Experiment conditions recorded by a female native English speaker. The signal-to- noise ratio of the signals was approximately 50 dB. A gentle Three different experiment conditions were created. The background melody was assigned to each individual branch first two conditions were based on the two interfaces de- of the menu. The melody started as soon as the user left the scribed in the previous section: 71 • condition V: the interaction was based on the visual 3. group: A, V, A1 interface In all three conditions, the test subjects were asked to • condition A: the interaction was based on the acoustic drive the car safely and perform the tasks as fast as possible. interface with multiple simultaneous sounds Each task was read to the test subjects loudly and clearly. For each interface, the tasks were given to the test subjects The third condition (A1) was also based on the acoustic in a random order. A successful completion of the individ- interface. In this case, however, just one sound was played ual task was signaled with the message ”Task completed” at a time. In condition A, up to six sound sources were (a sign on the screen in the visual menu and a recorded played at different spatial positions and one of the sources spoken message in the auditory menu). The duration times represented the selected menu item. In condition A1, just of the tasks and average speeds of the drivers were logged one sound source was played at a time. Also in this case the automatically. The entire experiment was recorded with a sound source was spatially positioned in order to be easily digital video camera and a post-analysis of the driving was separated from all other sounds (engine noise, traffic, envi- performed in order to evaluate the safety of an individual ronment noise, etc.). test subject’s driving. We expected the interface with multiple simultaneous The remaining 5 test subjects served as a control group sounds to be more efficient and faster than the one with and were asked to just drive the car without performing any just one sound played at a time. By comparing A and A1 tasks. conditions, we wanted to check whether the capacity of the acoustic channel could be increased and the selection or the search time could be shortened with the use of multi- 5. Results ple sounds. 4.6. Experiment procedure A total of 23 test subjects participated in the experiment. In the tasks performed by 18 test subjects, four parame- Approximately half of them were more experienced with ters or variables were evaluated: driving on the left hand side and half of them on the right • task completion times hand side. They all reported normal sight and hearing. Be- fore performing the experiment, all test subjects were asked • driving anomalies to fill out a questionnaire on their age, sex, driving experi- ences, and hearing and sight disabilities. After a short demo • NASA TLX workload questionnaire [13] of both interfaces and the interaction device, the test sub- • QUIS test [14] jects were allowed a 5 minute test drive in the simulator in order to get familiar with the steering wheel, pedals, road The main results and interpretations are summarized in conditions, etc. the following four subchapters. After the demo, 18 test subjects were asked to perform four different tasks while driving: 5.1. Task completion times 1. Changing the active profile of the device - PRF The time required to finish each individual task was mea- 2. Making a call to a specific person - CAL sured and logged automatically. The timer started when the initial command ”Please start now!” was read to the test 3. Deleting a specific image from the device - IMG subject, and turned off automatically when the task was con- cluded successfully. 4. Playing a specific song - SNG The analysis of variance (ANOVA) test compared the re- sults of the tasks and showed no significant difference be- The tasks were performed three times (i. e. for each ex- tween the three conditions: periment condition). A 15-minute break was assigned after each condition and the test subjects were also asked to fill • FP RF (2, 51) = 0.358, p = 0.701; out the NASA TLX workload questionnaire and the QUIS test. In order to eliminate the learning effects between the • FCAL (2, 50) = 0.550, p = 0.581; different interfaces, three groups of six participants were • FIM G (2, 51) = 1.213, p = 0.306; formed. Each group performed the tasks with a different order of the conditions: • FSNG (2, 50) = 0.211, p = 0.811; 1. group: V, A, A1 2. group: A1, A, V 72 The mean values of task completion times are shown in each individual task. They were given the following penalty table 1: points for anomalies in driving: • 1 penalty point: unsafe driving (slight winding on the Table 1. Mean task completion times (M) and road or slowing down unexpectedly and unnecessar- standard deviations (SD) in seconds ily), Condition MP RF SDP RF MCAL SDCAL • 2 penalty points: extreme winding on the road and V 17.83 12.83 32.39 14.38 driving on the road shoulders, A 19.83 11.47 37.94 33.78 A1 16.72 8.87 29.12 23.63 • 5 penalty points: causing an accident or crashing the Condition MIM G SDIM G MSNG SDSNG car. V 31.50 21.36 37.90 27.31 A 37.17 27.18 33.38 20.81 The penalty points for each task were then summed up A1 26.33 10.58 38.17 25.61 and the three conditions were compared again. The mean driving penalty points are shown in table 3: Table 2 shows the average task completion times of all tasks under individual conditions: Table 3. Mean driving penalty points (M) and standard deviations (SD) for the tasks Condition MP RF SDP RF MCAL SDCAL Table 2. Average task completion times of all tasks V 2.13 2.58 3.80 3.32 A 0.86 0.86 1.13 1.59 Condition Time / s A1 0.87 0.99 1.07 1.68 V 17.83 Condition MIM G SDIM G MSNG SDSNG A 19.83 V 4.20 5.22 3.67 4.30 A1 16.72 A 0.67 0.62 1.07 1.33 A1 1.00 1.66 1.07 1.43 We believe that the reason for non-significantly different results in all three conditions lies in the fact that the same Figure 3 shows the average penalty points for all three interaction device was used in all cases. The test subjects conditions and the control group. were already used to watching the screen while driving. On the other hand, we expected the task completion times in condition A to be shorter that those in condition A1. In con- dition A, multiple simultaneous sounds were used and the information flow should therefore have been greater. How- ever, the majority of the test subjects reported that condition A was too complicated due to the fact that it contained too many sounds for them to be able to perceive all of them at a certain moment. They reported condition A1 with just one sound played at a time to be more effective and easier to follow while driving. 5.2. Driving anomalies The entire experiment was recorded with a digital video camera and the recordings were used for evaluating the driv- ing performance. The car simulation program also enabled automatic logging of the driving speeds, crashes, etc. All Figure 3. The average number of penalty drivers (the 18 drivers performing different tasks + the con- points for all four conditions trol group consisting of 5 test subjects) were evaluated for 73 The ANOVA test showed significantly different results A1 (p ¡ 0.001), but no significant difference between the two for the tasks CAL, IMG and SNG and non-significantly dif- auditory conditions (p = 0.053). ferent results for PRF task: The reported results of the test subjects also reflect a high level of cognitive workload when operating a visual menu, • FPRF(2, 41) = 2.795, p = 0.073; since there is a lack of concentration which is mandatory for • FCAL(2, 41) = 6.493, p = 0.004; safe driving. The test subjects found the use of the auditory menus while driving easier and safer, and they also reported • FIMG(2, 41) = 5.479, p = 0.008; a lower perceived workload. • FSNG(2, 41) = 4.395, p = 0.019; 5.4. QUIS test The control group consisting of five test subjects who were asked to just drive the car as safely as possible scored The QUIS test was designed to assess the users’ sub- an average of 0.8 penalty points. jective satisfaction with specific aspects of the human- The results presented above show significantly fewer computer interface. We intended to measure the reaction of driving anomalies and a much greater safety when using the users to the software used in the experiment. We asked the auditory interfaces. The two auditory interfaces were the users to rank each of the interfaces on a scale from 0 to compared with a post-hoc T-test (0.5 limit on familywise 9 (0 being entirely false and 9 being entirely true), based on error rate) and no significant difference in the results could the following statements about each individual interface: be reported. Again, no advantage of condition A compared to condition A1 could be found. 1. the interface was more wonderful than terrible (W&T) The average driving speed was logged automatically by 2. the interface was more easy than difficult (E&D) the driving simulator. Only the average speed of each individual test subject and each individual condition was 3. the interface was more satisfying than frustrating recorded, not the speed for each task separately. The av- (S&F) erage speeds of the three conditions were: 4. the interface was more adequate than inadequate • V: 32 km/h (A&I) • A: 59 km/h 5. the interface was more stimulating than dull (S&D) • A1: 55 km/h 6. the interface was more flexible than rigid (F&R) • Control group: 60 km/h 7. it was easy to learn how to operate the system (O) There is almost no difference in the average speed when 8. it was easy to explore new features by trial and error using the two auditory conditions (A and A1); however, the (E) speed of the test subjects using the visual condition (V) is approximately 25 km/h lower. We believe the difference 9. it was easy to remember names and use commands (R) reflects a great amount of cognitive workload in the visual condition, since the drivers had to concentrate on the road The ANOVA test showed a significant difference in the and on the screen simultaneously. scores for the statements 1 to 4: • W&T: F(2,51) = 9.401, p ¡ 0.001; 5.3. NASA TLX workload test • E&D: F(2,51) = 14.171, p ¡ 0.001; TLX workload test reports on the overall workload per- • S&F: F(2,51) = 7.413, p = 0.001; ceived by the test subjects under different conditions. It is based on a subjective questionnaire divided into six differ- • A&I: F(2,51) = 11.814, p ¡ 0.001; ent subscales: mental demand, physical demand, temporal demand, performance, effort level and frustration level. The No significant difference in the scores could be found for final score for each condition is a weighed average of all the the statements 5 to 9: ratings of the six subscales. The results of the test subjects reported a significant difference between the three condi- • S&D: F(2,51) = 3.143, p = 0.052; tions: F(2, 321) = 15.386, p ¡ 0.001. The post-hoc T-test • F&R: F(2,51) = 2.495, p = 0.093; showed a significant difference in the workload between conditions V and A (p = 0.001), between conditions V and • O: F(2,51) = 1.073, p = 0.350; 74 • E: F(2,51) = 2.146, p = 0.127; The driving performance evaluation showed increased safety and a significant reduction in the distraction of the • R: F(2,51) = 1.529, p = 0.226; driver when the auditory interfaces were used. There was Figure 4 shows the average scores of individual inter- approximately a 60% difference in the penalty points be- faces: tween the visual and the auditory conditions. The average speed in the auditory conditions was approximately 25 km/h higher and therefore almost the same as the average speed of the control group. This most probably reflects that fact that the drivers felt more confident because they were not distracted by the information on the screen and were thus capable to pay attention to the road. The variations in the driving speed were also significantly smaller in the auditory conditions. The results of the TLX workload test indicate that the users felt less physical and temporal demand when inter- acting with the auditory interfaces. They felt a high level of satisfaction and were confident about their performance. The use of the auditory interfaces made them feel more se- cure and less stressed than the use of the visual interface. 7. Design recommendations and conclusions Figure 4. The average scores of individual QUIS factors Our experiment offers some useful design recommenda- tions for embedded communication systems in cars. The auditory interface with spoken commands proved to be very The results show that, in general, the users were satisfied effective and as fast as the visual interface. Our test subjects with the auditory interfaces. The users found the auditory reported the lack of feedback on the current location in the interfaces more wonderful than terrible, easy to use, satis- acoustic menu. They complained about occasionally getting fying and adequate. On the other hand, the users did not lost and having to move back to the main menu to restart the find them significantly more stimulating or flexible than the task. The background music with a changing central pitch visual interface. As regards the learning required to use the turned out to be a good solution as it helped the user to iden- interfaces, the users reported all interfaces to be equally dif- tify the individual submenus at any given time; however, it ficult to learn to operate, to explore new features by trial and should perhaps be upgraded with a few spoken feedback op- error, and also to remember names and commands. tions. For example, the option ”current location” could read all the previously selected commands and inform the user 6. Discussion on his or her current location. Multiple simultaneous sounds did not prove to have any The main goal of this study was the evaluation of an advantages when compared to a single sound source or acoustic interface as a substitute for the traditional visual menu item played at a time. The perception of multiple interface (V) of an in-vehicle display. The four main vari- sounds while driving seems to be almost impossible and dis- ables measured in the experiment were task completion turbing. The best results in the experiment were achieved in time, driving performance, workload and user satisfaction. the auditory condition with just one sound source played at We did not find any significant difference in the task a time. completion times. We believe the reason for this lies in the The visual interface turned out to be very unsafe and dis- fact that the same interaction device was used in all three turbing for the drivers. Although the LCD screen was at- conditions. We find the result that prove the auditory and tached to the dashboard where it could be seen easily when visual interfaces were equally fast very encouraging, since driving, a high number of driving penalty points still calls an entirely new interface was compared to a well-know and for a better solution. A head-up display developed by the widely used visual interface. On the other hand, we ex- BMW might turn out to be a better option for the visual pected condition A to be faster than condition A1 due to interface; however, some further evaluations are still neces- multiple simultaneous sounds and a larger information flow. sary [15]. That was not the case, since the majority of the test subjects The interaction device is also very important for the found condition A too difficult to understand while driving. safety of the driver. Our solution with the scroll wheel and 75 two buttons turned out to be very practical and easy to use Factors in Computing Systems, vol. 5, no. 1, 2003, pp. while driving a car. The test subjects found it safe to use 473-480. since they could maintain both hands on the steering wheel at all times. [9] N. Sawhney and C. Schmandt, ”Nomadic radio: speech As this was only a pilot study, further research has to be & audio interaction for contextual messaging in nomadic done on comparing the auditory interfaces to novel visual environments,” ACM Transactions on Computer-Human interfaces, for example a head-up display or a speech inter- Interaction, vol. 7, no. 3, 2000, pp. 353-383. face. In addition, a more realistic and demanding driving [10] Openal, From: http://www.openal.org/, 2007. scenario should be tested, such as a major street in an urban environment or driving under different weather conditions. [11] Creative Knowledgebase, From: http://us.creative.com/ support/kb/, 2007. References [12] RACER, From: http://www.racer.nl/, 2006. [1] F. Bents, ”Driver Distraction In- [13] NASA TLX for Windows, From: ternet Forum,” From: http://www- http://www.nrl.navy.mil/aic/ide/ NASATLX.php, 2006. nrd.nhtsa.dot.gov/departments/nrd-13/driver- [14] QUIS, About the QUIS version 7.0. From: distraction/AskTheExperts.htm#CurrentExpertQuestions, http://www.lap.umd.edu/ quis/, 2006. 2000. [15] BMW, From: http://www.bmw.com, 2007. [2] M.A. Pettitt, G.E. Burnett, ”Defining driver distrac- tion,” Proc. of World Congress on Intelligent Transport Systems, San Francisco, USA, 2005. [3] L. Tijerina, ”Issues in the Evaluation of Driver Distraction Associated with In-Vehicle Informa- tion and Telecommunications Systems,” From: http://www-nrd.nhtsa.dot.gov/departments/nrd- 13/driver-distraction/PDF/3.PDF, 2000. [4] T.A. Ranney, E. Mazzae, E. Garrot, R. Good- man, ”NHTSA Driver Distraction Research: Past, Present, and Future,” From: http://www- nrd.nhtsa.dot.gov/departments/nrd-13/driver- distraction/PDF/233.PDF, 2000. [5] B. N. Walker, A. Nance and J. Lindsay, ”Spearcons: Speech-based Earcons Improve Navigation Performance in Auditory Menus,” Proc. of the International Confer- ence on Auditory Display (ICAD 2006), London, Eng- land, 2006, pp. 63-68. [6] P. Lucas, ”An evaluation of the communicative ability of auditory icons and earcons,” Proc. of the Second In- ternational Conference on Auditory Display, Santa Fe, USA, 1994, pp. 121-128. [7] K. Crispien, K. Fellbaum, A. Savidis, C. Stephanidis, ”A 3D-Auditory Environment for Hierarchical Naviga- tion in Non-visual Interaction,” Proc. of the 3rd Inter- national Conference on Audio Display (ICAD ’96), Palo Alto, USA, 1996, pp. 18-21. [8] S. Brewster, J. Lumsden, M. Bell, M. Hall, M. Tasker, ”Multimodal ’Eyes-Free’ Interaction Techniques for Wearable Devices,” SIGCHI conference on Human 76