Abstract: Chest diseases are a subgroup of respiratorysystem diseases. We gathered Information about fourteen diseases, which attackthe chest, with their symptoms and investigations. In this paper, we present anArabic ontology based approach for chest diseases diagnosis. It can be used tohelp physicians and other users, determining the chest disease which patient issuffering from and what are investigations should be applied.
Keywords: Semantic Web, Knowledge Representation, Ontology Development, SparqlQuery.1. IntroductionNo one can ignore the importance of therespiratory system as one of the most important systems in the human body,which provides it with oxygen gas. Therefore, any trouble in its function willlead to death. Several diseases can infectthe respiratory system each of them attacks at least one organ and has a set ofsigns and symptoms. Therefore, the earlydetection of any defect in respiratory system functions stands at the top ofdoctor’s tasks.
However, the diagnosticprocess is not an easy process due to the complexity of the human body andoverlapping phenotypes. For that reason, computer science can be helpful tosupport physicians in diagnosing’s process. So building an ontology can easethe process of the chest disease diagnosis.
There is no research- to ourknowledge- uses ontologies to serve chest diseases in Arabic. So creating anontology will facilitate the diagnosis for doctors, by providing the system withthe patient’s symptoms and signs, which will query the ontology and return theexpected disease. The rest of this paper is organized as the following: Section2 introduces a medical background about chest diseases.
Section 3 gives abackground about ontology development and disease diagnosis. Section 4 presentsthe methodology of ontology development. Section 5 contains the implementation.Section 6 discusses the evaluation of the ontology. The last section, section7, contains the conclusion and future work. 2. MedicalBackgroundMainly respiratory system diseasescause morbidity and sudden death. Moreover, the most high-profile conditions inworld health terms contain diseases such as tuberculosis, pandemic influenzaand pneumonia.
In addition, the overall burden of chronic disease in thecommunity is increased by the prevalence of allergy, asthma and chronicobstructive pulmonary disease. The initiative to deal with respiratory diseasesembraces the basis of medical science and covers a breadth of pathologies.Recent advances have improved the lives of many patients with obstructive lungdisease, cystic fibrosis and pulmonary hypertension, but the outlook remainspoor for lung and other respiratory cancers, and for some of the fibro singlung conditions 1. 3. RelatedWorksThere are many works in thedomain of ontology development and disease diagnosis.
They are discussed inthis section.In 2 researchers suggest asystem methodology, whichcontains three basic modules, namely, the diagnostic module, the staging moduleand the treatment recommendation module. In order to detect patient disease,the patient provides his/her signs and symptoms to the diagnostic module, whichdetects what type of cancer the patient is suffering from. Once the type of thecancer is determined, the staging module finds the current stage of the cancerbased on the cancer type, signs and symptoms that are provided by the patient.
Based on the determined cancer type and cancer stage, the treatmentrecommendation module can recommend a specific treatment for the case at hand.In 3 researchers discuss theproblem of knowledge acquisition, which considered as a bottleneck in theprocess of developing such systems. The researchers propose an inference method according to the caseat hand. The system was designed to utilize the simplified medical knowledge,by taking the ontology and the symptoms as input.
Then in the first phase, the system returns a set ofdiseases. Thus, if the number of these returned diseases is one, then theprocess of diagnosing is finished. However, if the number of returning diseasesgreater than one, the differential diagnosing process begins, here the systemretrieves similar cases through semantic way depending on a knowledge model.In 4 Cristina Romero Tris proposed a decisionsupport system, which is built based on knowledge base to help physicians inthe diagnosis process, and to verify the diagnosismade by the doctor. Therefore, if the doctor inputs adisease that is not related to the symptoms, the system will notify the doctorof the inconsistency. Moreover, it will suggest the disease that best fits thesymptoms inputted by the doctor. The proposed system aims to personalizeexisting knowledge through the extraction of a partial ontology, which containsthe medical terms that belong to the patient.
In 5 Lakshman Jayaratne proposes a decision support system based onontology. The proposed system composed of two components genotype component andphenotype component. The initial input ofthe system is genetic sequences of a patient. Firstly, the system identifieswhether the input gene is mutated or not with respect to the reference genesequences. Then, according to those mutations, a corresponding common list ofphenotypes are identified and shown to the physician. They are listed downaccording to the frequency and probability of phenotypes that might occur.
The physicianmust first make sure that the phenotypes given by the system are really in thepatient’s body, and can then get the diagnostics report after choosing thematching phenotype from those listed by the system.In 6 Researchers proposed a framework composed of three phaseswherein the first phase is a text analysis technique would be applied tomedical records. In the second phase, the semantic analysis process is appliedto store the information extracted from the first phase in a knowledge base,and the last phase involves querying the knowledge base to get the relatedmedical information. Here, they use medical rules to infer additional medicalfacts about the patients and to generate a rich knowledge base of patient facts.In 7 Researchers proposed an approach based on the ontology todiagnose the disease and suggest appropriate treatment by identifying anomalousobservations on the parts of the tree. The approach consists of threeinterrelated modules: knowledge base, reasoning engine and server-sideapplication.
The knowledge base is built using OWL ontology and containsknowledge related to date palm diseases and insect pests. The reasoning engineaccepts user input queries and responses to data through the I/O interface by analysing the acquired dynamic informationtogether with the static knowledge stored in the knowledge base. The webapplication works as an interface to the system, where the user enters hisqueries and gets system feedback and an answer. The system was evaluated by ahuman expert in plant diseases by comparing his disease diagnoses to those ofthe system, the system showed good accuracy, the results were 83.5% accuratecompared to documented scientific answers. In addition, the ontology wasevaluated using the task based framework and it indicates an accuracy of 100%and 97.6% when using the precision and recall method. 4.
OntologyDevelopmentIn this section, we present the steps to develop anontology for chest disease diagnoses (CDDOnto) using development environment namely protégé.The proposed CDDOnto system will be very important for diagnosing chestdiseases. The ontology content relates to a medical domain and gathered from 8 and from a domain expert who helped to identifyconcepts, and relationships between them.There is a wide range of tools availablefor creating ontologies such as protégé, SWOOPand Onto Track.
We chose Protégé, because according to 9 it is the most domain-independenttool.According to 10building ontology consists of the following steps:· Determining the Domain of theOntologyWe cannot build an ontology without any purpose. Definingontology domain requires answering some questions:o What is the domain that the ontology will cover?The domain of the ontology is diagnosing chest diseases.o Why to use the ontology?To provide a knowledge base of chest diseases,symptoms and investigations.
It will be used in a system to make diagnoses thechest diseases and determine the disease.o What are the questions the ontology shouldanswer?§ What are the symptoms of a given disease?§ What are the investigations of a given disease?§ What is the disease of giving symptoms?o Who will use the ontology?The ontology will beavailable to the users include patients, physicians and students in the medical field.· Reusing Existing OntologiesWe have built CDDOnto from scratch since there is notsuch ontology.· Overviewing of the OntologyWe identified fourteen diseases, their symptomsand investigations and data were needed in the process of diagnosing chestdiseases. Figure 1 illustrates the core classes of the CDDOnto as well as therelationships among them. It has eight classes, four object properties, andthree data properties.· Enumerating Terms inthe CDDOntoWe added termsand properties to the ontology by studying the science of disease diagnosis andthrough analysing the structure of disease.
The following questions guide our activitiesto determine the terms:1) What are the mainterms that we want to talk about? The mainterms, we talk about, are disease (???), symptom (???) and investigation (???????).2) What are the properties of these terms? What is neededto be said about those terms?§ The disease(???) has the properties has symptom (??_???) and diagnosed by (????_??).§ The symptom(???) has the property symptom of (???_??).
§ Theinvestigation (???????) has the property diagnoses (????).· Defining Class Hierarchy of CDDOntoHere the step starts by defining classes, which arecreated in step c. Table 1 shows all classes in the CDDonto in English andArabic languages.· Defining the Properties of the ClassesThere are two types of properties, objectproperties and data properties. Object property is used to link object toobject, while data property is used to link an object to XML schema.
Once wedefined the classes, we clarified and reflected the internal structure ofclasses. Table 2 illustrates the properties of the classes.Figure 2 shows the mainclasses in the ontology and the relations (object properties) between them.Figure 3 shows all symptoms and investigation of thecystic fibrosis disease (i.e. All object properties).We added data properties such as ???, ?????, ??? to the ontology. They are used for givinga value to an instance of a class.
For example, “???” cause of “????????????” primary tuberculosis is Mycobacterium tuberculosis “????????? ????????”.The data property isapplicable to each instance of the class. Table 3, illustrate the dataproperties of the CDDOnto.· Definingthe facets of the slotsSlots have different facets that describe the value type, allowed valuesand the cardinality of the values slots can take. In our case, all of the slotvalues are string using UTF-8 (Arabic). For example, the value type of the “???” property isstring.
1) Value Type: Thisdescribes the types of values a property can has. The property “???” has the value type string.1) Allowed values: Thisrepresents values allowed to the properties. The allowed values for thedata property “???” are ???? ???? ???????.2) Cardinality: this defines how many values a property canhas. The property “???” has multiple cardinality.
It allows at least one value.Figure 4 shows the value type and cardinality ofsome of the properties.· Create Instances of CDDOntoAdding instances (individuals) of classes in theontology. We used an ontology to organize sets of instances. The creation ofindividuals allows all the properties of the classes to be recorded.
We tookthe information of individuals from 8. In CDDOnto, we defined 101instances that are representing all ontology concepts, including diseases,symptoms and investigations.· Apply Ontology ReasonerAfter creating instances, weapplied an ontology reasoner (e.g.
Hermit reasoner). This is necessary to checkthat everything is ok and to identify new relations from existing ones. 5. CDDONTO Implementation in ProtegeThis section describes the development of CDDOnto inprotégé as an owl ontology. 5.1. Classesand SubclassesClasses are the domain concepts and thebuilding blocks of ontology.
In CDDOnto disease (???), symptom (???) and investigation (???????) are subclasses of class Thing. Figure 5shows the main classes in the ontology, whereas figure 6 shows all.5.2.
Instances,Properties and FacetsIn CDDOntoIndividuals were defined with their data properties. In addition to objectproperties between them. Figure 7 shows data taxonomy such as “??????????????” pulmonary embolism class, which contains three instances.Pulmonary embolism is a blockage in the pulmonary artery, which supplies theblood to the lungs.Data Properties are shown in Figure 8 which contains 3properties, these properties are explained as follows:1- ???”name”: the ??? refers to the word of circulation of thedisease in the world.2- ???”Cause”: express an event in the human body, which Cause a disease.
3- ???”Description”: describes either a disease or an investigation.6. Evaluation of CDDOntoIn this section, we evaluatethe quality of the created ontology in representing all terms, properties, andrelations through disease examples and ontology querying using descriptionlogic query and the SPARQL query.6.1.Quality Evaluation through Disease ExamplesTo evaluate thequality of the CDDOnto we chose a disease example to check if the ontologyrepresents terms, properties and relations of a disease sample.
See figure 3 wenote that the Cystic fibrosis(?????? ??????)is an individual of inherited (?????) class, which is a subclass of disease (???) class, haseight symptoms and five investigations. The above example shows that theontology represents all needed symptoms and investigations. We can cite manysuch examples showing a complete representation of the domain.
6.2.Quality Evaluation through Ontology QueryingWe used the DescriptionLogic Query (DL-Query) that is a standard Protégé plugin to verify and validatethe ontology in accordance to competency questions. We present three queryingexamples which answer the main questions that are asked in the developmentprocess of the ontology.Example 1:· The question: what are the diseases that have thesymptom ‘?????’.
· DL_Query: ???and ??_??? value ?????.· Figure 9, shows the result of DL_Query, andillustrates the individuals of the disease class.Example 2:· The question: What are thesymptoms of Cysticfibrosis disease?· DL_Query is: ??? and ???_??value ??????_?????? Figure 10 shows the result of DL_Query andillustrates the individuals of symptom class.Example 3:· The question: what are the investigations of Cystic fibrosis disease?· DL_Query: ??????? and ???? value ??????_?????? · Figure 11 shows the result ofDL_Query, which is a set of investigations should be required by the doctor tomake sure that the disease is Cysticfibrosis.
Example4:In thisexample, we show a SPARQL· The question: what are the symptoms of Cystic fibrosis disease?· Figure 12shows SPARQL query and the result of the SPARQL query is shown in the figure 13,which illustrates the symptom’s individuals. That means all these symptoms aresymptoms of the Cystic fibrosis disease. 7. ConclusionIn this paper,we proposed an Arabic ontology-based approach for diagnosing chest diseases. Wefocused on the process of building Chest diseases knowledge base. The ontologycontent is related to the medical domain and It was gathered from 8 and froma domain expert. The ontology provides a knowledge base of diseases, symptoms,and investigations.
It will be used in a system to make the diagnoses of thechest diseases and determine the disease.In the future, we intend tobuild a query engine based on NLP model to enable users writing their queriesusing an Arabic natural language since that the target users may do not knowhow to write their queries using SPARQL.