Graph Databases Use Cases
“Big data” grows bigger every year, but today’s enterprise leaders don’t only need to manage larger volumes of data, but they critically need to generate insight from their existing data. Businesses need to stop merely collecting data points, and start connecting them. In other words, the relationships between data points matter almost more than the individual points themselves. In order to leverage those data relationships, your organization needs a database technology that stores relationship information as a first-class entity. That technology is a graph database.
While traditional relational databases have served the industry well in the past in enabling service and process models that tread upon these complexities, in most deployments they still demand significant overhead and expert levels of administration to adapt to change. Relational databases require cumbersome indexing when faced with the non-hierarchic relationships that are becoming yet more persistent in complex IT ecosystems, with partners and/or suppliers and service providers, as well as more dynamic infrastructures associated with cloud and agile.
Unlike relational databases, graph databases are designed to store interconnected data that’s not purely hierarchic, make it easier to make sense of that data by not forcing intermediate indexing at every turn, and also making it easier to evolve models of real-world infrastructures, business services, social relationships, or business behaviors that are both fluid and multi-dimensional.
Use Case: Master Data Management
The world of master data is changing. Data architects and application developers are swapping their relational databases with graph databases to store their master data. This switch enables them to use a data store optimized to discover new insights in existing data, provide a 360-degree view of master data and answer questions about data relationships in real time.
Your Master Data Management (MDM) program likely uses the same database technology as your transactional application: a mature, highly-tuned, relational database (RDBMS). You excel at relational databases because you have many years of experience working with them and most of your data live there, so it makes sense to keep master data there. Traditionally, MDM has included Customer, Product, Accounts, Vendor, Partners and any other highly shareable data in an enterprise.
Master data, by definition, is highly shared, this struggle tends to cost business agility in a way that ripples throughout the organization. Our architectures have focused on getting data to fit a single definition of the truth, something most of us realize is not a feasible solution in the long run.
The Future of Master Data Management
MDM programs that attempted to persist data in one single location physically continue to struggle with the realities of modern information technology. Most enterprise organizations use vendor applications: customer relationship management (CRM) systems, work management systems, accounts payable, accounts receivable, the point of sale systems, etc. Due to this approach, it’s not always feasible to move all master data to a single location. Even with a CRM system in place, we typically end up with customer information maintained in several systems. The same goes for product and accounting data as well.
The most successful programs will not strive to find a single physical location for all data but will provide the standards, tools, and services necessary to provide a consistent vision of enterprise data. There will be data we can store in one place, using the technologies that best fit its data story. Data will also likely be found in multiple physical systems due to the increasing use of packaged applications as well for performance and geographically-distributed processing needs. Once we understand our environment, we can architect solutions that build upon those needs.
The future of master data management will derive value from data and its relationships to other data. MDM will be about supplying consistent, meaningful views of master data. In many cases, we will be able to unify data into one location, especially to optimize for query performance and data fit. Graph databases offer exactly that type of data/performance fit, as we will see below. In this paper, we discuss why your master data is a graph and how graph databases like Neo4j are the best technologies for master data.
Today’s enterprises are drowning in “big data” – most of which is mission-critical master data – and managing its complex relationships can be quite a challenge. Here are some of the most difficult hurdles in MDM that enterprises must face:
Complex and hierarchical data sets
Master data, such as organizational and product data, has deep hierarchies with top-down, lateral and diagonal connections. Managing such data models with relational database results in complex and unwieldy code that are slow to run, expensive to build and time-consuming to maintain.
Real-time query performance
Master data systems must integrate with and provide data to a host of applications within the enterprise – sometimes in real time. However, traversing a complex and highly interconnected dataset to provide real-time information is a challenge.
Master data is highly dynamic with constant addition and re-organization of nodes, making it harder for your developers to design systems that accommodate both current and future requirements.
The best data-driven business decisions aren’t based on stale information silos. Instead, you need real-time master data with information about data relationships. Graph databases are built from the ground up to support data relationships. With more efficient modeling and querying, organizing your master data in a graph yields relevant answers faster and with more flexibility than ever before.
Virtual Machines for data science
Use Case: Network and IT Operations
By their nature, networks are graphs. Graph databases are, therefore, an excellent fit for modeling, storing and querying network and IT operational data no matter which side of the firewall your business is on – whether it’s a communications network or a data center. Today, graph databases are being successfully employed in the areas of telecommunications, network management, impact analysis, cloud platform management and data center and IT asset management. In all of these domains, graph databases store configuration information to alert operators in real time to potential shared failure modes in the infrastructure and to reduce problem analysis and resolution times from hours to seconds.
Network analysts and data center professionals face greater challenges than ever before as the volume of data and size of networks continues to grow. Here are just a few of their most difficult challenges:
Highly interrelated elements
Whether you’re managing a major network change; providing more efficient security-related access; or optimizing a network, application infrastructure or data center – the physical and human interdependencies are extremely complex and challenging to manage.
Non-linear and non-hierarchical relationships
Relationships among the various nodes in your network are neither purely linear nor hierarchical, making it difficult to model using traditional RDBMS. In addition, when two or more systems are brought together, these relationships become even more complex to describe.
Growing physical and virtual nodes
With the rapid growth in network sizes and both the number and types of elements added to support new network users and services, your IT organization must develop systems that accommodate both current and future requirements.
As with master data, a graph database is used to bring together information from disparate inventory systems, providing a single view of the network and its consumers – from the smallest network element all the way to the applications, services and customers who use them. A graph representation of a network enables IT managers to catalog assets, visualize their deployment and identify the dependencies between the two. The graph’s connected structure enables network managers to conduct sophisticated impact analyses, answering questions like:
• Which parts of the network – which applications, services, virtual machines, physical machines, data centers, routers, switches and fiber – do particular customers depend on? (Top-down analysis)
• Conversely, which applications and services, and ultimately, customers in the network will be affected if a particular network element – such as a router or switch – fails? (Bottom-up analysis)
• Is there redundancy throughout the network for the most important customers?
A graph database representation of the network can also be used to enrich operational intelligence based on event correlations. Whenever an event correlation engine (such as a Complex Event Processor) infers a complex event from a stream of low-level network events, it assesses the impact of that event against the graph model and triggers any necessary compensating or mitigating actions.
Discovering, capturing and making sense of complex interdependencies is central to effectively running Network and IT operations are a critical part of running an enterprise. Whether it’s optimizing a network or application infrastructure or providing more efficient security-related access – these problems involve a complex set of physical and human interdependencies that are a challenge to manage. The relationships between network and infrastructure elements are rarely linear or purely hierarchical. Graph databases are designed to store that interconnected data, making it easy to translate network and IT data into actionable insights.
Use Case: Real-Time Recommendation Engine
Graph databases have revolutionized the way people discover new products, services, information, and people. Recommendation engines powered by graph databases help companies personalize products, content, and services by leveraging the connections between data — all in real time.
Whether your enterprise operates in the retail, social, services or media sector, offering your users highly targeted, real-time recommendations are essential to maximizing customer value and staying competitive. Unlike other business data, recommendations must be inductive and contextual to be considered relevant by your end consumers.
With a graph database, you capture a customer’s browsing behavior and demographics and combine those with their buying history to instantly analyze their current choices and then immediately provide relevant recommendations – all before a potential customer clicks to a competitor’s website.
With so many data to track and process in a short amount of time, creating a recommendation engine capable of relevant, real-time suggestions isn’t easy. Here are some of the biggest challenges involved:
Process large amounts of data and relationships for context
Collaborative and content-based filtering algorithms rely on rapid traversal of a continually growing and highly interconnected dataset.
Offering relevant recommendations in real time
The power of a suggestion system lies in its ability to make a recommendation in real time using immediate purchase or browsing history.
Accommodate new data and relationships continually
The rapid growth in the size and number of data elements means your suggestion system needs to accommodate both current and future requirements.
Real-time recommendation engines provide a key differentiating capability for enterprises in retail, logistics, recruitment, media, sentiment analysis, search and knowledge management.
The key technology in enabling real-time recommendations is the graph database. Graph databases also out-class other database technology for connecting masses of buyer and product data (or connected in general).
Making effective real-time recommendations depends on a database that understands the relationships between entities, as well as the quality and strength of those connections.
Only a graph database efficiently tracks these relationships according to user purchase history, interactions and reviews to give you the most meaningful insight into customer needs and product trends.
Graph-powered recommendation engines can take two major approaches:
· Identifying resources of interest to individuals
· Identifying individuals likely to be interested in a given resource
With either approach, graph databases make the necessary correlations and connections to serve up the most relevant results for the individual or resource in question.
Storing and querying recommendation data using a graph database allows your application to provide real-time results rather than recalculated, stale data. As consumer expectations increase – and their patience decreases – providing these sorts of relevant, real-time suggestions will become a greater competitive advantage than ever before. Real-time recommendation engines provide a key differentiating capability for enterprises in retail, logistics, recruitment, media, sentiment analysis, search and knowledge management.
Use Case: Fraud Detection
Banks and insurance companies lose billions of dollars every year to fraud.
Traditional methods of fraud detection fail to minimize these losses since they perform discrete analyses that are susceptible to false positives (and false negatives). Knowing this, increasingly sophisticated fraudsters have developed a variety of ways to exploit the weaknesses of discrete analysis.
Graph databases, on the other hand, offer new methods of uncovering fraud rings and other complex scams with a high level of accuracy through advanced contextual link analysis, and they are capable of stopping advanced fraud scenarios in real time.
Between the enormous amounts of data available for analysis and today’s experienced fraud rings (and solo fraudsters), fraud detection professionals are beset with challenges. Here are some of their biggest:
Complex link analysis to discover fraud patterns
Uncovering fraud rings requires you to traverse data relationships with high computational complexity – a problem that’s exacerbated as a fraud ring grows.
Detect and prevent fraud as it happens
To prevent a fraud ring, you need real-time link analysis on an interconnected dataset, from the time a false account is created to when a fraudulent transaction occurs.
Evolving and dynamic fraud rings
Fraud rings are continuously growing in shape, and size and your application need to detect these fraud patterns in this highly dynamic and emerging environment.
While no fraud prevention measures are perfect, significant improvements occur when you look beyond individual data points to the connections that link them. Understanding the relationships between data, and deriving meaning from these links doesn’t necessarily mean gathering new data. You can draw significant insights from your existing data simply by reframing the problem in a new way: as a graph.
Unlike most other ways of looking at data, graphs are designed to express relatedness. Graph databases uncover patterns that are difficult to detect using traditional representations such as tables. An increasing number of companies use graph databases to solve a variety of connected data problems, including fraud detection.
Now let’s consider how graph databases can help solve this problem. Uncovering rings with traditional relational database technologies requires modeling the graph above as a set of tables and columns and then carrying out a series of complex joins and self-joins. Such queries are very complex to build and expensive to run. Scaling them in a way that supports real-time access poses significant technical challenges, with performance becoming exponentially worse not only as the size of the ring increases but as the total data set grows.
Graph databases have emerged as an ideal tool for overcoming these hurdles. Languages like Cypher provide a simple semantic for detecting rings in the graph, navigating connections in memory, in real time.
The graph data model in Diagram 4 below represents how the data looks to the graph database and illustrates how one can find rings by simply walking the graph.
Augmenting one’s existing fraud detection infrastructure to support ring detection can be done by running appropriate entity link analysis queries using a graph database, and running checks during critical stages in the customer & account lifecycle, such as:
1. at the time the account is created,
2. during an investigation,
3. as soon as a credit balance threshold is hit, or
4. when a check is bounced.
Real-time graph traversals tied to the right kinds of events can help banks identify probable fraud rings: during or even before the Bust- Out occurs.
Whether it is bank fraud, insurance fraud, e-commerce fraud, or another type of fraud, two points are very clear. The first is the importance of detecting fraud as quickly as possible so that criminals can be stopped before they have an opportunity to do too much damage. As business processes become faster and more automated, the time margins for detecting fraud are becoming narrower and narrower, increasing the call for real-time solutions.
The second is the value of connected analysis. Sophisticated criminals have learned to attack systems where they are weak. Traditional technologies, while still suitable and necessary for certain types of prevention, are not designed to detect elaborate fraud rings. This is where graph databases can add value.
Graph Databases are the ideal enabler for efficient and manageable fraud detection solutions. From fraud rings and collusive groups to professional criminals operating on their own, graph databases provide a unique ability to uncover a variety of significant fraud patterns, in real time. Collisions that were previously hidden become evident when looking at them with a system designed to manage connected data, using real-time graph queries as a powerful tool for detecting a variety of highly-impactful fraud scenarios.
Use Case: Social Network
Whether you’re leveraging declared social connections or inferring relationships based on activity, graph databases offer a world of fresh possibility when it comes to creating innovative social networks.
Here are some of the biggest challenges:
Highly dynamic networks
Social networks change and evolve quickly, so your application must be able to detect early trends and adapt accordingly.
High density of connections
Social networks are densely connected and become more so over time, requiring you to parse this relationship data quickly for better business insights.
Relationships are equally important
When you’re striving to understand user behavior in social networks, relationships between users are as important as the individual users themselves. Your social network application must be able to process data relationships as quickly as it processes individual data entities.
Navigating a social graph and understanding both individuals and their relationships requires complex and deep queries. These particular queries bring most relational databases to their knees. Likewise, other types of NoSQL databases struggle to handle high degrees of relatedness. Graph databases are both easy and quick at traversing relationships, and they return instantaneous query results, making them an ideal choice for your social application.
When exploring relational database options, it became clear there that a graph database was a better and safer choice for this project. One important factor was the so-called impedance mismatch. The data and queries were clearly graph-oriented, and it was clear that “bending” the data into a tabular format would incur significant project cost and performance overhead. A graph database solution was able to meet both operational and analytic requirements. Graph databases were a technology that fit the use case much better than relational databases because they are a natural fit for the social domain.
Use Case: Identity and Access Management
Identity and access management (IAM) solutions store information about parties (e.g., administrators, business units, end-users) and resources (e.g., files, shares, network devices, products, agreements), along with the rules governing access to those resources. IAM solutions apply these rules to determine who can or can’t access or manipulate a resource. Traditionally, identity and access management have been implemented either by using directory services or by building a custom solution inside an application’s backend. Hierarchical directory structures, however, can’t cope with the complex dependency structures found in multi-party distributed supply chains. Custom solutions that use nongraphic databases to store identity and access data become slow and unresponsive as their datasets grow.
A graph database can store complex, densely connected access control structures spanning billions of parties and resources. Its richly and variably structured data model supports both hierarchical and non-hierarchical structures, while its extensible property model allows for capturing rich metadata regarding every element in the system.
With a query engine that can traverse millions of relationships per second, graph database access lookups over large, complex structures execute in milliseconds, not minutes or hours.
As with network and IT operations, a graph database access control solution allows for both top-down and bottom-up queries:
• Which resources – company structures, products, services, agreements and end users – can a particular administrator manage? (Top-down)
• Given a particular resource, who can modify its access settings?(Bottom-up)
• Which resource can an end-user access?
Access control and authorization solutions powered by graph databases are particularly applicable in the areas of content management, federated authorization services, social networking preferences and software as a service (SaaS) offerings, where they realize minutes-to-milliseconds increases in performance over their relational database predecessors.
Today’s enterprise data professionals face greater challenges than ever before when it comes to storing and managing user identities and authorization. Not only must data architects deal with user access fraud, but they also must manage all of these changing relationships in real time. Here are some of their biggest challenges:
Highly interconnected identity and access permissions data
To verify an accurate identity and its access permissions, the system needs to traverse through a highly interconnected dataset that is growing in size and complexity.
Productivity and customer satisfaction
As users, products and permissions grow, traditional systems no longer deliver responsive query performance, resulting in diminished user experience and frustration for users.
Dynamic structure and environment
With the rapid growth in the size of users and their associated metadata, your application needs to accommodate both current and future identity management requirements.
For your enterprise organization, managing multiple changing roles, groups, products and authorizations is an increasingly complex task. Relational databases simply aren’t up to the task of managing your identity and access needs as queries are far too slow and unresponsive. Using a graph database, you seamlessly track all of your identity and access relationships real-time results, connecting your data along intuitive relationships. With an interconnected view of your data, you have better insights and controls than ever before.
Use Case: Graph-Based Search
Managing your organization’s growing library of digital assets requires a more robust search solution. With graph-based search tools, your queries return more accurate and relevant real-time results. A graph-based search is a new approach to data and digital asset management originally pioneered by Facebook and Google.
Search powered by a graph database delivers relevant information that you may not have specifically asked for – offering a more proactive and targeted search experience, allowing you to triangulate the data points of the greatest interest quickly.
The key to this enhanced search capability is that on the very first query, a graph-based search engine takes into account the entire structure of available connected data. And because graph systems understand how data is related, they return much richer and more precise results.
Think of graph-based search more as a “conversation” with your data, rather than a series of one-off searches. It’s search and discovery, rather than search and retrieval.
In this “Graph Databases in the Enterprise” series, we’ll explore the most impactful and profitable use cases of graph database technologies at the world’s leading organizations. In past weeks, we’ve examined fraud detection, real-time recommendation engines, master data management, network & IT operations, and identity & access management (IAM).
As a cutting edge technology, the graph-based search is beset with challenges. Here are some of the biggest:
The size and connectedness of asset metadata
The usefulness of a digital asset increases with the associated rich metadata describing the asset and its connections. However, adding more metadata increases the complexity of managing and searching for an asset.
Real-time query performance
The power of a graph-based search application lies in its ability to search and retrieve data in real time. Yet, traversing such complex and highly interconnected data in real time is a significant challenge.
Growing number of data nodes
With the rapid growth in the size of assets and their associated metadata, your application needs to be able to accommodate both the current and future requirements.
The graph-based search would be impossible without a graph database to power it. In essence, the graph-based search is intelligent: You can ask much more precise and useful questions and get back the most relevant and meaningful information, whereas traditional keyword-based search delivers results that are more random, diluted and low-quality.
With graph-based search, you can easily query all of your connected data in real time, then focus on the answers provided and launch new real-time searches prompted by the insights you’ve discovered.
Graph databases make advanced search-and-discovery possible because:
Enterprises can structure their data exactly as it occurs and carry out searches based on their own inherent structure. Graph databases provide the model and query language to support the natural structure of data.
Users receive fast, accurate search results in real time. With a graph database, a variety of rich metadata is assigned to all content for rapid search and discovery.
Data architects and developers can easily change their data and its structure as well as add a wide variety of new data. The built-in flexibility of a graph database model allows for agile changes to search capabilities.
In contrast, information held in a relational database is much more inflexible to future change: If you want to add new kinds of content or make structural changes, you are forced to re-work the relational model in a way that you don’t need to do with the graph model.
The graph model is much more easily extensible and over 1,000 times faster than a relational database when working with connected data.
For businesses that have huge volumes of products, content or digital assets, graph-based search provides a better way to make this data available to users, as corporate giants Google and Facebook have clearly demonstrated.
The valuable uses of graph-based search in the enterprise are endless; customer support portals, product catalogs, content portals and social networks are just a few.
Graph-based search offers numerous competitive advantages, including better customer experience, more targeted content and increased revenue opportunities. Enterprises that tap into the power of graph-based search today will be well ahead of their peers tomorrow.