The concept of the smart city is perhaps the ultimate application of the Internet of Things (IoT) and connected technologies: a symphony of sensors generating data that can help vastly improve the way cities manage assets, resources, and services—and make life smoother for their citizens, as well. A cyber-physical-social system (CPSS), smart cities harness tremendous amounts of data from citizens, infrastructure, and buildings and analyze it to improve the operations of everything from traffic and transit systems to utilities and water, to crime detection and emergency services, and even aid in future planning.

However, with massive data collection from citizen behavior comes concerns about privacy. Sensitive information, like financial and medical data and location tracking, must be secured before it’s integrated together for analysis. This results in a tradeoff of privacy versus accuracy, where preservation of privacy limits the efficiency and effectiveness of smart city systems. By securing users’ spatial-temporal contexts, it becomes harder to fully execute data analysis and prediction after data fusion.

To address this challenge, a team of IEEE researchers recently proposed a novel privacy-aware data fusion and prediction approach for smart cities that promises to provide accurate contextual data without the need to use the personal information of citizens. According to the team, the solution is especially useful for smart city service decision-making, such as traffic scheduling and intelligent tourism planning, based on multi-source industrial data where users’ sensitive information is fragmented across different platforms.

“The major contribution of this paper is that we provide a locality-sensitive hashing-based data integration solution in which the close data points in original data space are still close after hashing,” said Qi Lianyong, professor at Qufu Normal University in Jining, China. “This way, we can evaluate whether two points are close or not based on the hash indexes of the two points, without revealing the sensitive information of the two points.”

An example of multisource smart city data fusion with a three-layer user edge-cloud architecture

The proposal is based on a three-layer architecture, composed of the user layer (data generation), edge layer (data privacy elimination and data filtering), and cloud layer (data fusion, analysis, and prediction). Privacy-aware work begins at the edge layer, where locality-sensitive hashing helps servers convert user-service quality data from the user layer—where data contains sensitive spatial-temporal contexts—into a smaller set of item indices. These indices are the only data sent to the cloud center, resulting in secured privacy with the added benefit of streamlined data loads, which decreases the associated processing time.

At the cloud layer, the central cloud platform gathers the indices produced by the edge servers using a recommender system that discovers similar items, which are then used for analysis, prediction, and recommendation. Accuracy of prediction and recommendation is ensured due to the inclusion of all the less-sensitive item indices in cloud decision-making.

Experiments enacted by the researchers measured and compared the accuracy of four related approaches using a real-world dataset and found that theirs outperformed the other three in aspects of time cost and accuracy while maintaining privacy security for users.

“We believe the proposed privacy-aware data fusion method in this article could benefit big data analyses and applications in a smart city, such as medical records fusion and cross-platform social network data mining,” Lianyong said. “In the future, we will further refine our data fusion and prediction model by introducing more context factors besides time and location (e.g., network bandwidth) as well as the corresponding compensation strategy if privacy is breached.”

Interested in expanding your knowledge in the Internet of Things? IEEE offers continuing education IEEE Guide to the Internet of Things (IoT) course program.