1 Introduction

Traditional martial arts, known as kung fu in Chinese, represent significant systems of mind-body practices and treasures of human knowledge. Globally renowned, partly thanks to the blossoming of kung fu movies in the past decades, these long-lived traditional practices have multiple registrations on UNESCO’s lists of Intangible Cultural Heritage (ICH). Even so, kung fu practices have gradually lost their public appeal with an ever-decreasing number of practitioners. There is an urgent need to rekindle public interest, especially among the younger generations, who have grown up as digital natives, and engage them in knowing and exploring martial arts knowledge.

To this end, semantic web technologies and ontology engineering have enabled various means to preserve ICH, creating public access to digitised and born-digital cultural records through united data pathways. While models that conceptualise material and object-based cultural aspects have reached a certain degree of maturity, few efforts have probed into the roles that individual people have played throughout knowledge development. This humanistic dimension is particularly relevant to martial arts, a distinct practice incorporating embodied, performative and ideological understandings, wherein knowledge transmission heavily relies on in-person contact. Consequently, numerous practitioners have contributed to the development and re-creation of martial arts knowledge. Between the practitioners, there exist certain explicit or implicit relations likely to have influenced stylistic development.

In an attempt to tackle the human-centred perspective above, we constructed a knowledge graph (KG) representing the inter-relations between distinguished practitioners, namely the masters, and investigate its application to facilitate knowledge access as well as to study (culture) contacts through the various relations between individuals in history. Specifically, this effort focuses on Chinese martial arts and introduces the MA2KG (Martial Arts MAsters Knowledge Graph), an ontology-based data resource representing the networks of historical Chinese kung fu masters. The computational pipeline (Figure 1) is built upon the data model of Martial Art Ontology (MAon) (). It combines ontological linkage with rule-based inference to gather data from diverse sources into a consistent graph structure. The data sources include online structured databases of Wikipedia and its Chinese alternative Baidu Baike, in addition to manual transcription of the texts in the Hong Kong Martial Arts Living Archive, accordingly establishing a bilingual resource in English and Chinese.

Figure 1 

Illustration of the engineering pipeline in building the MA2KG.

The MA2KG dataset has been published online with script examples for replication in similar agent-mediated contexts. In this article, we present our methods of building the KG, results, data application and inspection, which are followed by a further discussion of findings and concerns regarding the employment of linked open data (LOD) strategies.

2 Related work

As an increasing number of cultural heritage (CH) materials transition to the digital realm, challenges have emerged, especially for the GLAM (galleries, libraries, archives, and museums) sectors, in their efforts to assemble heterogeneous data into an operable resource that can be searched, studied, and presented (). Conventional approaches to this end have largely centred on cataloguing. However, their effectiveness is contingent on users understanding the logic behind data tagging, which seldom holds for the general public.

Seeking ways to operate and interoperate such various cultural objects has led to the rising use of semantic web technologies. Notably, LOD and KG have emerged as promising solutions to tackle the shortcomings of manual cataloguing. A common approach involves creating a network-like knowledge description model, known as domain ontology, and its application to connect data sources through explicit relationships or rule-based inferences. This new path has shown promise in facilitating public access to cultural collections, enabling casual users to initiate queries and explore data through linked connections without prior knowledge of the content. Achieving this in practice requires effective KG engineering, which involves structuring data elements into a formal conceptual framework and establishing connections between the data through techniques such as named entity recognition and data classification for querying ().

KG is not a new approach. Formed as a network composed of nodes, edges, and labels (or properties), KG has been widely applied to illustrate real-world concepts via linking, relating, and analysing entities in massive datasets via semantic relationships (). Interdisciplinary researchers have embraced this concept to enrich CH and ICH studies by conceptualising cultural entities using semantic standards. Concurrently, ontological engineering has been increasingly used in structuring cultural materials into programmatic structures, enabling data representation relating tangible and intangible identities detected from textual (), visual (), iconographical () and audiovisual () features in diverse contexts.

Despite these existing efforts, humans have not been adequately represented as informative nodes, even though they consistently play a traceable role in interconnecting knowledge exchanges and communications. An exception is ArCO, which establishes a comprehensive ontology model describing Italian CH, incorporating a context description module that delineates humanistic information such as authors, collectors, copyright holders and inventories in relation to a CH entity (). Likewise, WarSampo KG provides a semantic infrastructure spanning dimensions such as Persons (soldiers), Army Units, Places, and Events, facilitating the coherent presentation of distributed data sources related to the Second World War ().

At the same time, LOD has fostered new prosopography and social network analysis methods. Such formal network approaches are important for examining the relationships between intellectuals and the evolution of communities, a phenomenon researchers interpret as a small-world effect resulting from entangled webs of influences, interdependencies, and inspirations (; ). In comparison to conventional network construction from a single literature or scholarly documentation, knowledge engineering allows researchers to extract data from a broader range of sources to build more complex networks. For example, the China Biographical Database (CBDB) integrates approximately 491,000 individuals (as of May 2021) whose lifespans range from the seventh through nineteenth centuries, along with over 228,000 biographical articles (; ). This forms a valuable resource for further studies, such as analysing kinship and themes based on trivial items of news or facts (). When these relational patterns of pivotal concepts, including eponyms, individuals, places, things, and times, are extrapolated appropriately, they serve as helpful tools for historical examination and supporting or disputing historical arguments (; ). These technical and knowledge advances have laid the foundation for our investigation.

3 Dataset description

  • Object name: MA2KG (Martial Arts MAsters Knowledge Graph).
  • Format names and versions: RDF data (TTL syntax).
  • Creation dates: March to August 2022.
  • Dataset creators: Yumeng Hou, Lin Yuan.
  • Language: English, Chinese.
  • Repository name: Zenodo, https://zenodo.org/record/8211203.
  • Publication date: 2023-08-03 (first publish date on GitHub: 2022-10-09).

4 Methods and results

As depicted in Figure 1, our workflow involves data acquisition and knowledge generation to build a KG for both machine operation and human-led analysis. The computational process begins by extracting domain-specific entities and properties from LOD and importing the structured RDF triples into a graph database. Subsequently, we integrate knowledge elements from manual annotations into a property graph, forging new links through explicit relationships and rule-based inference. In the final stage, visual computation is applied to examine the use of data to aid in analytics. This also involves assigning visual attributes to facilitate the study of each entity or broader patterns.

4.1 The data model

The data rationale underpinning these steps relies on the Martial Art Ontology (MAon), an ontology model that conceptualises the epistemic aspects of traditional martial arts. As illustrated in Figure 2, the schema adopts a human-centred perspective and pivots on the class MA_Master to associate different classes and properties in describing martial arts masters. The network structure also makes it possible to illustrate interpersonal contact and single out implications arising from exchanges between individuals, techniques, or styles. More specifically, explicit relations, such as is_master_of, is a student of and is_clan_member_of, can be extracted from LOD materials and transformed into ontological data to complement manual annotations. Implicit relations, i.e. share_place and share_time, can be constructed through rule-based inference based on relevant attributes. The rationale behind establishing such “share” connections is that two masters may have interacted if they lived in the same area during the same time period, potentially resulting in martial arts exchanges that influenced a specific style.

Figure 2 

Schematic illustration of the Master module instantiated by the MA2KG.

4.1.1 Principal entities in the MA2KG

The following paragraphs outline the principal classes and properties in the MA2KG schema and elucidate the design motivation through typical instances.

class MA_master This class denotes the distinguished martial art practitioners who have attained impressive martial arts skills and hold significance in transmitting knowledge to a group of students, often referred to as a clan. The honorary titles of masters primarily confer public recognition, rather than rank, upon these practitioners for their accomplishments within their respective regions, styles (or systems), and communities. Instances of this class include master:Ip Man, master:Bruce Lee, and master:Lam Sai Wing, to name a few.

The class inherits the assertions of the CIDOC-CRM E21.Person to link with other datasets and incorporate additional descriptions about a person. Moreover, a range of properties has been devised to link up the entities of class MA_master, representing both explicit and inferred clues about interpersonal exchanges between the masters. These comprise explicit lineage and master-disciple relations, such as [master:Ip Man] – (is_master_of) – [master:Bruce Lee], and conversely, [master:Bruce Lee] – (is_student_of) – [master:Ip Man]. Implicit relations can be inferred from the persons’ overlap in place and lifetime, such as [master:Ip Man]– (share_place) – [master:Wong Fei-hung] and [master:Ip Man] – (share_time) – [master:Huo Yuanjia].

class MA_style A martial art style signifies a compilation of principles, techniques and training methods that differentiate one style from another. Instances within this class are overarching denoting a specific martial art style, usually characterized by shared kinesthetic qualities, norms, and principles. Yet in some cases, there may be a symbolic element that acts as an inspiration or descriptor of its stylistic characteristics. Examples include style:Fujian White Crane (or known as style:Yong Chun White Crane), style:Hung Gar Kuen, style:Wing Chun, and many more. When a practitioner is identified to have practised a particular martial style or styles according to the data source, a practises property gets established, for example, [master:Ip Man] – (practises) – [style:Wing Chun].

Certain relationships also exist between the martial art styles and systems, for which we have devised a series of properties. Respectively, belongs_to indicates a martial art style is part of a martial art system, whilst is_descendant_of implies that a certain style can be considered as descending from another. For instance, [style:Lam Family Hung Gar] – (is_descendant_of) – [style:Hung Gar] and [style:Hung Gar] – (belongs_to) – [system:Southern Chinese Styles]. The relational property is_subgroup_of applies to the hierarchical interpretation between the systems, such as [system:Southern Shaolin Styles] – (is_subgroup_of) – [system:Southern Chinese Styles].

class Martial_art_system Martial art styles can be categorised under a common theme according to factors like geographic origins, ethnic groups, documented lineage, or other criteria. This is conveyed through the multilayered conceptualisation of Martial_art_system, for instance, system:Southern Chinese martial arts, system:Shaolin kung fu, and system:Aikido). Styles within the same system typically adhere to a uniform technical and philosophical framework.

class Place is an instantiation of the CIDOC-CRM class E53.Place, which denotes a specific geographical location. For example, it could be the Blue House - situated on Stone Nullah Lane, Hong Kong, where Lam Sai Wing once operated a school alongside a clinic. Other examples include the city of Foshan in China and Hong Kong. This class provides information about the locations where a master’s major activities have taken place and complements records of their birthplace (via relation born_in), death place (died_in), residence (resides_in) and citizenship (is_citizen_of). For example, [master:Ip Man] – (born_in) – [place:Foshan] and [master:Ip Man] – (is_citizen_of) – [place:Hong Kong], [master:Bruce Lee] – (resides_in) – [place:Seattle] and [master:Bruce Lee] – (died_in) – [place:Kowloon Tong].

4.2 Open data acquisition and KG construction

The MA2KG has been created through a combination of data sources, including structured data obtained from Wikidata, Wikipedia pages, and manual annotations provided by the authors in both English and Chinese. These annotations were based on the content of books and exhibition panel texts produced from the Hong Kong Martial Arts Living Archive (; ; ). In this section, we debrief the workflow considerations and implementation steps. Additionally, the source codes, core ontologies, RDF resources and scripts have been made publicly available on GitHub.

4.2.1 Open data acquisition

Valuable data for CH representation can exist in a wide range of formats, including open knowledge bases, historical documents, audio-visual archives, and social media records. Given the diverse nature and scale of the data, different approaches to data acquisition may apply. In our attempt to construct a bilingual KG of kung fu masters, our objective was to acquire structured data from open knowledge bases and integrate manual additions into a unified structure.

The acquisition of open data runs primarily on the Wikidata Query Service, a collaborative knowledge graph platform that offers a SPARQL endpoint for developers to retrieve RDF triples using semantic queries.

To automate the process, we created a SPARQL template for executing queries that extract a chain of Person entities (see Listing 1). A fundamental consideration guiding the code design is that a relationship could have probably existed between any pair of masters historically. In cases of indirect relationships, such as shared membership in a specific lineage of practitioners, a multi-step connection could be found through overlap in location or time. Accordingly, the query runs with a specified Person entity, denoted as Qx, and then searches for other Person entities, denoted as Px, which have at least one relationship to or from the Person Qx. This process incorporates data fields that are crucial for identifying a master, such as art name, birth and death year, citizenship and occupation, as properties within the entity graph.

Listing 1 

Excerpt of the SPARQL template to construct a chain of Person entities.

4.2.2 Knowledge integration and KG construction

Knowledge inference operates on the property graph acquired through the steps described in Section 4.2.1. The objective is to deduce implicit relationships from explicit ones, leveraging the existing entity-relationship triples and inference rules. For instance, the inferred relationships named “Share Time” and “Share Place” signify overlaps in years and locations between two masters, which indicates a potential interaction (like competition and friendship). When the same place-related entity is detected in the entity network of two masters, e.g., Ip Man and Wong Fei-hung both have the relationship (Is Citizen Of)-[Hong Kong], the Share Place relationship will be established between them. Likewise, the Share Time relationship is created to denote overlapping lifetimes.

However, inference computations may generate false relationships that lead to inaccurate knowledge, which makes knowledge validation necessary. This process primarily involves consultation with domain experts to ensure the credibility of the knowledge graph by referencing the original data sources during the workflow process. Subsequently, we sent the pre-checked dataset to a scholar specialising in Southern Chinese traditional martial arts based in Hong Kong for review and potential amendments.

Table 1 lists key metrics for the final MA2KG. The graph comprises 594 nodes and 14,289 relationships, a representation that reasonably aligns with the statistics of Chinese kung fu masters identifiable from relevant documentation and online sources. It is well connected, as indicated by the metrics of WCC and SCC. And 98.9% of the edges (relations) within the MA2KG are connected to a master, as the primary goal of data integration was to enhance contextual knowledge about the masters, guided by a human-centric rationale.

Table 1

Essential structural metrics of the MA2KG dataset.


Master nodes241 (0.406)


Master-related edges14,132 (0.989)

Nodes in the largest WCC448 (0.754)

Nodes in the largest SCC279 (0.470)

5 Application and inspection

During the visual analysis of the MA2KG, we implemented a graph database using Neo4j, a graph data management system equipped with a range of analytical capabilities. Specifically, we employed Cypher (a SQL-like language) for data query, inference and integration, Neo4j Graph Data Science Library for computing relevant graph metrics, and Neo4j Bloom for building an interactive visualisation interface, as demonstrated in Figure 3.

Figure 3 

Visual implementation of MA2KG in Neo4j Bloom, where (1) the Search Bar enables natural-language keyword search using entity names, pre-scripted tags or query blocks; (2) the Node Inspection panel presents descriptive information about a selected node and is extensible to accommodate complete ontological annotations; (3) the Graph Panel visualises the queried sub-graph; and (4) the Legend Panel provides flexibility to adjust the styling features according to visual attributes.

5.1 Lineage analysis

A commonly explored theme in the study of historical exchanges, lineage analysis typically examines the relationships between people of different generations or martial art styles. In this context, Figure 4 shows the lineage diagram of the prominent practitioners of Wing Chun, a style that gained global recognition chiefly through kung fu movies. The diagram visually portrays the intricately connected lineages and styles of the masters. Notably, the two most eye-catching nodes, Master Ip Man and his student Bruce Lee, hold the highest centrality scores in the graph. These visual outcomes are consistent with the widely held perception of these masters’ significance. For example, Ip Man is well known as a legendary Wing Chun master whose numerous students went on to develop new styles or sub-styles of martial arts. Bruce Lee, arguably the most famous disciple of Ip Man, established Jeet Kune Do, a style known for inheriting the fundamental concept of Wing Chun, which emphasises efficiency in both time and movement via single fluid motions that attack while defending ().

Figure 4 

The master-style lineage diagram of the Wing Chun style, as computed from the MA2KG.

As the sample suggests, individual data sources often provide only partial information and a more comprehensive understanding can be achieved by supplementing this information with related data across different sources. However, it’s worth mentioning that open data sources, like Wikidata, are not always scholarly or accurate due to various factors, such as a lack of historical documentation, the interference of language and political events, and the private systems of knowledge transmission within certain clans. Nevertheless, by integrating and cross-referencing data from different sources - both in English and Chinese, online and offline – the MA2KG modality demonstrates its potential to discover implicit associations and spot errors, thus helping researchers uncover and validate certain facts. Such network features permit users with little prior knowledge to explore and keep exploring databases. Moreover, it is promising to enable a recommendation system serving the public dissemination of cultural information.

5.2 Influence and centrality

In detecting influential masters, we utilised PageRank and Degree centrality algorithms to assess the centrality of masters’ nodes and determine their significance. Specifically, PageRank evaluates the importance of nodes based on the number of incoming relationships, taking into account the importance of the corresponding source nodes. Degree centrality, on the other hand, measures the connectivity by counting the number of links connected to a node, whether they are incoming, outgoing, or both. In addition, Wikidata’s Pageview stats were collected to supplement the measurement of masters’ popularity. These were calculated from the visit statistics of each Wikipedia page bearing a master’s name.

The results, shown in Table 2, reveal variations in the ranking of “central” masters based on different metrics. For instance, Wong Fei-hung, a master of Hung Gar (or Hung Kuen), attains the highest Degree centrality score, likely due to his extensive teaching and interaction with many masters throughout his life. Meanwhile, Tung Ying-Chieh, the instructor of Yang Cheng-Fu and a renowned teacher of Yang-style Tai Chi, tops the PageRank metric, reflecting the cumulative importance of a node and its neighbouring connections. According to Wikidata’s Pageview stats, Bruce Lee appears as the most significant martial artist, probably due to his impact on the media and movie sectors.

Table 2

The most influential masters in the MA2KG based on three distinct metrics.


Wong Fei-Hung1730.16155,375

Leung Sheung1650.245,062

Lam Sai-Wing1580.1815,214

Ip Man1440.351,306,847

Bruce Lee1270.446,238,349

Tung Ying-Chieh222.672,830

These findings suggest a possible bias in the data, possibly arising from different ways of measuring “influence” in digital social records and inherent biases within the data sources. For instance, Lam Cho, a notable master in the history of Hung Gar who refined the Lam Family Hung Gar lineage, failed to stand out in all three measures, possibly because his online profile is not as popular. This underscores the importance of involving domain expertise in graph construction. Scholars from diverse backgrounds should be involved to foster KG’s inclusiveness and integrity.

5.3 The no-names

Public attention tends to gravitate towards well-documented narratives and known figures. In contrast, individuals who have historically played a role in the transmission or evolution of martial arts may remain relatively obscure. To enhance the visibility of lesser-known masters, we harnessed Dijkstra’s shortest path algorithm () to reveal the potential linkages between a given master and all other nodes in the graph of MA2KG.

Figure 5 illustrates a network featuring three masters: Wong Fei-hung, Wong Shun Leung, and Barry Pang. The latter two, although not widely recognised, came to our attention when we explored the various paths within a length of three from the significant figure, Master Wong Fei-hung. In this example, implicit relationships provide valuable insights that can unveil intriguing connections between entities that might otherwise hardly be observable. For instance, Master Wong Shun Leung, who was active during the era of Master Wong Fei-hung, became visible. Similarly, through the master-disciple relationship and stylistic influence, Master Barry Pang emerges as a notable figure, credited with making substantial contributions to the development of martial arts in Australia.

Figure 5 

Illustration of explicit and implicit relationships connected to Master Wong Shun Leung.

6 Conclusion

To achieve the goal of organising heterogeneous materials into an informative and interoperable model, this article presents an effort that automatically acquires martial arts knowledge from dispersed data sources in different languages. This is accomplished by creating the MA2KG, a knowledge graph encompassing 241 Chinese kung fu masters based on an extended ontology of martial arts. Additionally, a visual exploration interface is implemented, allowing casual users to interact with the conceptual entities of kung fu masters.

In addition to introducing the MA2KG graph dataset and engineering methods, this work expresses a fresh perspective on curating cultural data that places a strong emphasis on the individuals and their interconnections. The approach responds to the frustration concerning “ways of access” that cultural collections face today. With this effort, we aim to highlight the pivotal roles that people play in the (re)creation and transmission of ICH knowledge and make their contributions resonate with the people today.

Nonetheless, the reported work retains a potential for future improvement. During the data application and inspection stage, as we dig into new interpretations enabled by the creation of LOD datasets, concerns about the reliability of these interpretations grow. Depending solely on publicly crowd-sourced data instead of scholarly materials inevitably raises credibility issues. Yet, a potential solution may involve integrating human-led academic validation and machine-operated cross-referencing checks into the data integration workflow. This underscores another limitation in the current data acquisition process, which is that taking data only from structured data pools like Wikidata is insufficient. Therefore, extending the approach to extract knowledge from unstructured data is imperative, where a combination of named entity recognition and text mining holds promise.