中国药房网完成首次全新改版 本刊讯中国药房网（网址：http：//www.china－pharmacy.com.cn或http： //www.china－pharmacy.net）自1998年底推出以来，于2000年12月进行了第1次改版，并计划于2001年4月隆重推出正式 版。 中国药房网是在《中国药房》杂志电子版的基础上，由中国药房杂志社控股的重庆市光正医药网络发展有限公司全新策划制作的医药类综合网站。网站面向 全国药学专业人士，兼顾医学及广大健康受众，旨在为药师、医师及药品监督管理、药品研究、药品生产、药品经销机构提供广泛的和有深度的药学专业内容，同时 也面向广大的药品使用者，为其提供药品选购和使用指导。 中国药房网是《中国药房》杂志的延伸与拓展，她的发展目标是不但要成为提供药品相关信息内容的专业网站，还将为制药企业和药品经销商的优良产品提 供一个与药师、医师、消费者相互交流的平台，为患者的合理用药、经济用药提供指引。 中国药房网目前的主要栏目设置如下： 1.医药新闻：实时发布医药新闻资讯和社会重要新闻资讯； 2.医药卫生法规：提供医政、药政、卫生防疫、医疗保险四大类的法律、法规查询； 3.求医问药：辟有怡寿园、女性天地、男性健康、人之初旅、医生手记、“性”福天地、名医验 方等七大版块，针对男女老幼分类讲述医药保健常识， 介绍了一些常见疾病的诊治和注意事项，科学论述两性生活 方面的知识，同时重点介绍一些名医验 方； 4.药品介绍：分为处 方药、非处 方药、新药等三大类药品知识； 5.医药资源：可以 方便查询制药企业、医疗机构、医学院校、药品经销商等信息； 6.医药期刊：提供各类专业医药期刊搜索和链接； 7.电子版：全文登载《中国药房》杂志2000年起各期文章； 8.聊天室和BBS：为广大医药专业人士及患者、网友提供信息交流园地； 9.导医台：为广大患者及网友提供就医指南。 中国药房网拥有一支经验丰富的优秀人才队伍，包括众多国内一流的医药学专家组成的顾问队伍和资深电脑技术团队，雄厚的人才资源和丰富的医学资源为 网站的发展奠定了良好的根基。相信不久，中国药房网将成为国内首屈一指的医药学综合性网站。
OBJECTIVE： The purpose of this project was to suggest guidelines for management of Low grade Squamous Intraepithelial Lesion(LSIL) by evaluating natural course of LSIL of the uterine cervix. MATERIAL & METHODS： Among the women who visited Korea University Kuro Hospital from Jan. 1993 to Oct. 1998， One hundred fifty eight patients who were diagnosed as LSIL with colposcopy directed biopsy were followed up by colposcopy and/or cytology， HPV DNA test every 3 months. RESULT： In I58 patients who were diagnosed as LSIL， colposcopic examination confirmed progression to high grade intraepithelial lesion(HSIL) in 17(10.7%) patients， persistence of LSIL in 87(55%) patients， regression to normal in 54(34.2%) patients during the 3 year follow-up period. In prediction of LSIL subgroup， abnormal Pap test is 39%， 64%， 71% and abnormal HPV test is l6%， 29%， 65% in regression， persistent and progression group respectively. The shortest time of transition from minor lesion(LSIL) to high grade lesion was 12 months (the range， 12-51). CONCLUSION： Of the patients who were diagnosed as LSIL and monitored by colposcopy for 60 months， 34% had disease that regressed， 55% had persistent disease， and 11% had progressive disease. HPV DNA test(p=0.002) is more informative than pap test(p=0.567) in prediction of disease progression.
Author information： (1)firstname.lastname@example.org
START/END YEAR： 1993 - 1997. PUB TYPE： Journal/Periodical/Series (Unknown reviewprocedure， 52 item(s) per year) ISSN： 1068-6576. SUBJECT(S)： Workers League (US); Periodicals;Socialism; Labor movement. DISCIPLINE： No discipline assigned. LC NUMBER： HX1 .I58
Local estimates for the gradients of solutions to the simplest regularization of a class of nonuniformly elliptic equations.
Psychiatric Annals | Behavioral agitation is common in patients with moderate to severe dementia， present in up to 90% of patients depending on how broadly it is defined，1-23 For example， in a chart review of 57 patients with Alzheimer's disease， Reisberg et al reported that 33 i58%) patients had "significant behavioral symptomatology，" including delusions， nonspecific agitation， and diurnal rhythm disturbances. 6
Psychiatric Annals | Behavioral agitation is common in patients with moderate to severe dementia， present in up to 90% of patients depending on how broadly it is defined，1-23 For example， in a chart review of 57 patients with Alzheimer's disease， Reisberg et al reported that 33 i58%) patients had "significant behavioral symptomatology，" including delusions， nonspecific agitation， and diurnal rhythm disturbances. 6
1Predicting Breast Cancer SurvivabilityUsing Data Mining TechniquesOmead Ibraheem Hussainomead2007@gmail.comABSTRACTThis study concentrateson Predicting Breast Cancer Survivability using data mining，and comparing between three main predictive modeling tools.Precisely， we used threepopular data mining methods： two from machine learning (artificial neural network anddecision trees) and onefrom statistics (logistic regression)， and aimed to choose the bestmodel through the efficiency of each model and with the most effective variables to thesemodels and the most common important predictor. We defined the three main modelingaims and usesby demonstrating the purpose of the modeling. By using data mining， wecan begin to characterize and describe trends and patterns that reside in data andinformation. The preprocessed data set contents were of 87 variables and the total of therecords are457，389; which became 93 variables and 90308 records for each variable， andthese dataset were from the SEER database. We have achieved more than three datamining techniques and we have investigated all the data mining techniques and finally wefind the best thing to do is to focus about these data mining techniques which areArtificial Neural Network， Decision Trees and Logistic Regression by using SASEnterprise Miner 5.2 which is in our view of point is the suitable system to use accordingto the facilities and the results given to us. Several experiments have been conductedusing these algorithms.The achieved prediction implementations are Comparison-basedtechniques. However， we have found out that the neural network has a much betterperformance thanthe other two techniques. Finally， we can say that the model we chosehas the highest accuracy which specialists in the breast cancer field can use and dependon.2Data Understanding1 IntroductionIn their world wide End-User Business Analytics Forecast， IDC， a world leader in theprovision ofmarket information， divided the market anddifferentiatebetween “core” and“predictive” analytics (IDC， 2004). Breast Cancer is the Cancer that forms in Breast tissuesand is classed as a malignant tumour when cells in the Breast tissue divide and grow withoutthe normal controls on cell death and cell division. We know from looking at Breast structurethat it contains ducts (tubes that carry milk to the nipple) and lobules (glands that make milk)(Breast， 2008).Breast Cancer can occur in both men and women， although Breast Cancer inmen is rarerand soBreastCancer is one of the common types of Cancer and major causes ofdeath in women in the UK. In the last ten years， Breast Cancer rates in the UK have increasedby 12%. In 2004 there were 44，659 new cases of Breast Cancer diagnosed in the UK： 44，335(99%) inwomen and 324 (1%) in men.Breast Cancer risk in the UK is strongly related toage， with more than (80%) of cases occurring in women over 50 years old. The highestnumber of cases of Breast Cancer are diagnosed is in the 50-64 age groups.Although veryfew cases of Breast Cancer occur in women in their teens or early 20s， Breast Cancer is themost commonly diagnosed Cancer in women under 35. By the age of 35-39 almost 1，500women are diagnosed each year. Breast Cancer incidence rates continue to increase with age，with the greatest rate of increase prior to the menopause.As the incidence of Breast Cancer is high and five-year survival rates are over 75%， manywomen are alive who have been diagnosed with Breast Cancer(Breast， 2008). The mostrecent estimate suggests around 172，000 women are alive in the UK having had a diagnosisof Breast Cancer. Even though in the last couple of decades， with their increased emphasistowards Cancer related research， new and innovative methods of detection and earlytreatment have developed which help to reduce the incidence of Cancer-related mortality(Edwards BK， Howe HL， Ries Lynn AG， 1973-1999)， Cancer in general and Breast Cancer tobe specific is still a major cause of concern in the United Kingdom.Although Cancerresearch is in general clinical and/or biological in nature， data drivenstatistical research is becoming a widespread complement in medical areas where data andstatistics driven research is successfully applied.For health outcome data， explanation of model results becomes really important， as the intentof such studies is to get knowledge about the underlying mechanisms.Problems with the data or models may indicate a common understanding of the issuesinvolved which is contradictory. Common uses of the models， such as the logistic regressionmodel， are interpretable. We may question the interpretation of the often inadequate datasetsto predict. Artificial neural networks have proven to produce good prediction results inclassification and regression problems.This has motivated the use of artificial neural network(ANN) on data that relates to health results such as death from Breast Cancer disease or itsdiagnosis. In such studies， the dependent variable of interest is a class label， and the set ofpossible explanatory predictor variables—the inputs to the ANN—may be binary orcontinuous.Predicting the outcome of an illness is one of the most interesting and challenging tasks inwhich to develop data mining applications. Survival analyses is a section in medicalspeculation that deals with the application of various methods to historic data in order topredict the survival of a specific patient suffering from a disease over a particular time period.3With the rising use of information technology powered with automated tools， enabling thesaving and retrieval of large volumes of medical data， this is being collected and being madeavailable to the medical research community who are interested in developing predictionmodels for survivability.1.1 BackgroundWe can explain here some research studies which carried out regarding the prediction ofBreast Cancer survivability.The first paper is “Predicting Breast Cancer survivability： a comparison of three miningmethods” (Delen， Walker， and Kadam，2004). They have used three data mining techniques，which are decision tree (C5)， artificial neural networks and the logistic regression. They haveused the data contained in the SEER Cancer Incidence Public-Use Database for the years1973-2000， and obtained the results by usingthe raw data which was uploaded into the MSAccess database， SPSS statistical analysis tool， Statistical data miner， and Clementine datamining toolkit. These software packages were used to explore and manipulate the data. Thefollowing section describes the surface complexities and the structure of the data.The resultsindicated that the decision tree (C5) is the best predictor from which they found an accuracyof 93.6%， and they found it to be better than the artificial neural networks which had anaccuracy of about 91.2%. The logistic regression model was the worst of the three with89.2% accuracy.The models for the research study were based on the accuracy， sensitivity and specificity，and evaluated according to these measures.These results were achieved by using 10 foldcross-validations for each model. They found according to the comparison between thethree models， that the decision tree (C5) performed the best of the three models evaluatedand achieved a classification accuracy of 0.9362 with a sensitivity of 0.9602 and a specificityof 0.9066. The ANN model achieved accuracy 0.9121 with a sensitivity of 0.9437 and aspecificity of 0.8748. The logistic regression model achieved a classification accuracy of0.8920 with a sensitivity of 0.9017 and a specificity of 0.8786， the detailed prediction resultsof the validation datasets are presented in the form of confusion matrixes.The second research study was “predicting Breast Cancer survivability using data miningtechniques”(Bellaachia and Guven， 2005). In this research they have used data miningtechniques： the Naïve Bayes， the back-propagated neural network， and the C4.5 decisiontree algorithms (Huang， Lu and Ling 2003). The data source which they used was the SEERdata (period of 1973-2000 with 433，272 records named as Breast.txt)， they pre-classifiedinto two groups of “survived” 93，273 and “not survived” 109，659 depending on the SurvivedTime Records (STR) field. They have calculated the results by using the Weka toolkit. Theconclusion of the research study was based on calculations dependent on the specificity andsensitivity. They also found that the decision tree (C4.5) was the best model with accuracy0.0867， then the ANN with accuracy 0.865 and finally the Naïve Bayes with accuracy 0.0845.The analysis did not include records with missing data. This research study did not includethe missing data， but our research does include the missing data， and this is one of theadvances we made when comparing to previous research.The third research study was “Artificial Neural Network Improve the Accuracy of CancerSurvival Prediction” (Burke HB， Goodman PH， Rosen DB， Henson DE， Weinstein JN， Harrell JrFE， Marks J R， Winchester DP， Bostwick DG， 1997). They have focused on the ANN and the4TNM (Tumor Nodes Metastasis) staging and they used the same dataset SEER， but for newcases collected from 1977-1982. Based on this research study， the extent of diseasevariables for the SEER data set were comparable to the TNM variablesbut not alwaysidentical to it. If considering accuracy， they found when the prognostic score is not relatedto survival and the score is 0.5， indicates a good chance for the accuracy， but if the score isfrom 0.5， that means this is better on average forthe prediction model is at predictingwhich of the two patients will be alive.The fourth research study was “Prospects for clinical decision support in Breast Cancerbased on neural network analysis of clinical survival data” (Kates R， Harbeck N， Schmitt M，2000). This research study used a dataset for patients with primary Breast Cancer wereenrolled between 1987 and 1991 in a prospective study at the Department of Obstetrics andGynecology of the Technische University of Munchen， Germany. They have used twomodels (neural network and multivariate linear Cox). According to the conclusion， theneural network in this dataset does not prove that the neural nets are always better thanCox models， but the neural environment used here tests weights for significance， andremovingtoomany weights usually reduces the neural representation to a linear model andremoves any performance advantage over conventional linear statistical models.1.2 Research Aims & ObjectivesThe objective of the present presentation is to significantly enhance the efficiency of theaccuracy of the three models we chose. Considering the justification of high efficiency of themodels，it was decided to embark on this research study with the intended outcome ofcreating a accurate model tool that could both build calculate and depict the variables ofoverall modeling and increase the accuracy of these models and the significant of thevariables.For the purposes of this study， we decided to study each attribute individually， and toknowthe significant of the variables which are strongly built into the models. Also， for the firstiteration of our simulation for choosing the best model (Intrator， O and Intrator， N 2001)， wedecided to focus on only three data mining techniques whichwere mentioned previously.Having chosen to work exclusively with SAS systems， we also felt it would be advantageousto work with SASrather than other software since this system is most flexible.After duly considering feasibility and time constraints， we set ourselves the following studyobjectives：(a) Propose and implement the three models which are selected and applied andtheir parameters are calibrated to optimal values and to measure and predictthe target variable (0 for not surviveand 1 for survive).(b) Propose and implement the best model to measure and predict the targetvariable (0 for not survive and 1 for survive).(c) To be able to analyse the models and to see which variables have most effectupon the target variable.(e) To visualize the aforementioned target attributes through simple graphicalartifacts.(f) Built the models that appear to have high quality from a data analysisperspective.51.3 ActivitiesThe steps taken to achieve the above objectives can be summarised as below. Asmentioned， the study consisted of building the model which has the highest accuracy andanalyzing the three models we chose.Points (a) and (b) relate to the data preparation of the study， points (c) and (d) relate to thebuild of the model and points (e) through (g) relate to the analyse of the models：(a) To characterise and describe trends and patterns that resides in data and informationabout the data.(b) To choose the records， as well as evaluating these transformation and cleaning of datafor modeling tools. Cleaning of data contains estimate of missing data by modeling(mean， mode etc.).(c) Selecting modeling techniques and applying their parameters， requirements on theform of data and applying the dataset of our choosing.(d) Evaluation of the model and review of the steps executed to construct the model toachieve the business objectives.(e) To be able to analyse the models and to see which variables are more applicable to thetarget variable.(f) Decide on how the decision on the use of the data mining result should be reached.(g) SAS software to be able to get the best results and analyse the variables which aremost significant to the target variable.1.4 Methodology and Scope1.4.1 Data sourceWe decided to use a data set which is a compatible with our aim; the data mining task wedecided to use was the classification task.One of the key componentsof predictive accuracy is the amount and quality of the data(Burke HB， Goodman PH， Rosen DB， 1997).We used the data set contained in the SEER Cancer Incidence Public-Use Database for theyears 1973-2001. The SEER is the surveillance， Epidemiology， andEnd Results data fileswhich were requested through web site (http：//www.seer.Cancer.gov). The SEER Program ispart of the Surveillance Research Program (SRP) at the National Cancer Institute (NCI) andis responsible for collecting incidence and survivaldata from the participating twelveregistries (Item Number 01 in SEER user file in the Cancer web)， and deploying thesedatasets (with the descriptive information of the data itself) to institutions and laboratories forthe purpose of conducting analyticalresearch (SEER Cancer).The SEER Public Use Data contains nine text files， each containing data related to Cancer forspecific anatomical sites (i.e.， Breast， rectum， female genital， colon， lymphoma， otherdigestive， urinary， leukemia， respiratory and all other sites). In each file there are 93 variables(the original dataset before changing) which became 33 variables， and each record in the filerelates to a specific incidence of Cancer. The data in the file is collected from twelve differentregistries(i.e.， geographic areas). These registries consist of a population that isrepresentative of the different racial/ethnic groups living in the United States. Each variablesof the file contains 457，389 records (observations)， but we are making some changesto the6total of the variables adding some extra variables according to the variables requirements inthe SEER file， for instance the variable number 20 which is (extent of disease) contains (12-digits)， the variable field description are denoted to (SSSEELPNEXPE) and we describethose letters to： SSS are the size of tumor， EE are the clinical extension of tumor， L is thelymph node involvement， PN are the number of positive nodes examined， EX are the numbernodes examined and PE are the pathological extensions for 1995+ prostate cases only. Wehave had some problems when we converted data into SAS datasets， but we recognized theproblem which was with some names of the variables， for instance the variable“Primary_Site” and “Recode_ICD_O_I” are actually character variables： they therefore needto be read in using a “$” sign to indicate that the variable is text， we have also read in thevariable “Extent_of_Disease”. There are two types of variables in the data set which arecategorical variables and continuous variables.Afterwards， we explored the data， preparation and cleansing the dataset， the final datasetwhich contained of 93 Variables 92 predictor variables and the dependent variable.The dependent variable is a binary categorical variable with twocategories： 0 and 1， where 0representing to did not survive and 1 representing to survived. The types of the variables are：The categorical variables are： 1. Race (28 unique values)， 2. Marital Status (6 values)， 3.Primary Site Code (9 values)， 4. Histology (123 values)， 5. Behaviour (2 values)， 6. Sex (2values)， 7. Grade (5 values)， 8. Extent of Disease (36 values) 9. Lymph node involvement (10values)， 10. Radiation (10 values)， 11. Stage of Cancer (5 values)， 12. Site specific surgerycode (11 values).While the continuous variable are： 1. Age， 2. Tumor Size， 3. Number of Positive Nodes， 4.Number of Nodes， 5. Number of Primaries.The dataset is divided into two sets： Training set and testing set. The training set is used toconstruct the model，and the testing set is employed to determine the accuracy of the modelbuilt.The position of the tumor in the Breast may be described as the positions on a clock; asshown in figure (1) (Coding Guidelines Breast， 2007).7O’clock Positions and CodesQuadrants of Breasts------------------------------------------UOQ12UOQUOQ12UOQC50.4111C50.2C50.2111C50.4122C50.01029393C50.184487575LOQ6LIQLIQ6LOQC50.5C50.3C50.3C50.5RIGHT BREASTLEFT BREASTFigure 1. O’clock positions and codes quadrant of Breastsfigure 2. Shows Breast Cancer Survival Rates by State：Figure 2. Breast Cancer Survival Rates by StateData Mining Techniques2. Background2.1 Data mining， what is data mining? Why use data miningNowadays， the data mining is the process of extracting hidden knowledge from large volumesof raw data. Data mining is the main issue at the moment， the main problems these days are8how we can to forecast about any kind of data to find the best predictiveresult for predicativethe our information. Unfortunately， many studies fail to consider alternative forecastingtechniques， the relevance of input variables， or the performance of the models when usingdifferent trading strategies.The concept of data mining is often defined as the process of discovering patterns in largerdatabases. That means the data is largely opportunistic， in the sense that it was not necessarilygot for the purpose of statistical inference. A significant part of a data mining exercise isspent in an iterative cycle of data investigation， cleansing， aggregation， transformation， andmodeling. Another implication is that models are often built on data with large numbers ofobservations and/or variables. Statistical methods must be able to execute the entire modelformula on separately acquired data and sometimes in a separate environment， a processreferred to as scoring.Data mining is the process of extracting knowledge hidden from largevolumes of raw data. Powerful systems for collecting data and managing it in large databasesare in place in all large and mid-range companies. However， the bottleneck turning this datainto valuable information is the difficulty of extracting knowledge about the system studiedfrom the collected data. Data mining automates the process of finding relationships andpatterns in raw data and delivers results that can be either utilized in an automated decisionsupport system or assessed by a human analyst(Witten & Frank， 2005). The following figureshowsdata mining process model：Figure 3. Data mining process modelData mining is a practical topic and involves learning in a practical， not theoretical; sense(Witten & Frank， 2005). Data mining involves the systematic analysis of large data sets usingautomated methods. By probing data in this manner， it is possible to prove or disproveexisting hypotheses or ideas regarding data or information， while discovering new orpreviously unknown information. In particular， unique or valuable relationships between andwithin the data can be identified and used proactively to categorize or anticipate additionaldata (McCue， 2007). People always use data mining to get knowledge， not just predictionsGaining knowledge from data certainly sounds like a good idea if we can do it.2.2 ClassificationClassification is a key data mining technique whereby database tuples， acting astrainingsamples，are analysed in order to produce a model of the given data which we have used topredict group outcomes for datasetinstances and we used it to predict whether the patient willbe alive or not alive as our project.It predicts categorical class labels classifies data(constructs a model) based on the training set and the values (class labels) in a classificationattribute and uses it in classifying new data. The predictions are the models continuous-9valued functions， that means predicts unknown or missing values (Chen， 2007). In theclassificationeach list of values is supposed to belong to a predefined class which consideredby one of the attributes， called theclassifyingattribute.Once derived， the classification modelcan be used to categorize future data samples and also to provide a better understanding ofthe database contents. Classification has numerous applications including credit approval，product marketing， and medical diagnosis.Testing and Results4. Testing and ResultTable 1. Shows some statistical information about the interval variables：Table 1. Interval variablesObsNAMEMEANSTDSKEWNESSKURTOSIS1Age_recodeless12.672.909-0.08295-0.71142Decade_at_Diagnosis55.9514.989-0.00715-0.56083Decade_of_Birth1919.4716.0770.13791-0.3694Num_Nodes_Examined_New11.816.7683.4542615.02125Num_Pos_Nodes_New40.245.5210.45785-1.75096Number_of_primaries1.210.4642.228515.26147Size_of_Tumor_New92.4230.7323.6194711.2935As we know the SAS Enterprise Miner doing all the necessary Imputation and transformationto the data set， then we don’t want to be veryworried about the data if it isn’t distributednormally as we said before.Figure 4. The graph is a 3-D vertical bar chart of 'Laterality'， with a series variable of‘Grade’， and a subgroup variable of 'Alive'， and a frequency value， and shows the details ofthe values by clicking the arrow on the chart.10Figure 5. Chi-Square PlotTable 2. Showing the important variables to the Alive (Target Variable)：Table 2. Chi-Square and Cramer’s VInputCramer's VChi-SquareDFOrderedInputsPLOTGROUPCOUNTSEER_historic_stage_A0.28727447.8014111Clinical_Ext_of_Tumor_New0.28082445.71426212Site_specific_surgery_I0.24455164.8723313Reason_no_surgery0.21064004.0766414Tumor_Marker_I0.20053631.6615515Conversion_flag_I0.19913581.3745616Tumor_Marker_II0.19823549.2765717Sequence_number0.17072630.8066818Lymph_Node_Involvement_New0.1551617.97458919Grade0.15252099.662410110Histologic_Type_II0.15022037.3717911111Diagnostic_confirmation0.1121132.812712112Recode_I0.1012921.42221713113Marital_status_at_diagnosis0.0986877.4522514114PS_Number0.0841639.0324815115Race_ethnicity0.0841638.29512316116Radiation0.0791564.559917117Birthplace0.0784555.401919818118ICD_Number0.0675411.894519119Laterality0.0576300.0729420120Behaviour_recode_for_Analysis0.0526250.0972121021Radiation_sequence_with_surgery0.0385133.5344522022First_malignant_prim_ind0.00978.478975123023The SEER historic stage A Cramer’s V is 0.29 which means the association between SEERhistoric stage A and Alive is 0.29 which means there is a relation between them，11Clinical_Ext_of_Tumor_Newand Alive is 0.28 and so on， but the association between Aliveand First_malignant_prim_ind are almost non-existent because it is close to 0.Form the basic analysis to the dataset， we see the important variable to the Target varaiable(Alive) is the SEERhistoric stage A (stages 0， 1， 2， 4 or 8)， for instance if the stage is 1 thatmeans the localized stage of an invasive neoplasm confined entirely to the organ of origin.Table 3. Class Variable Summary StatisticsVariableNumber of unique valuesBehaviour_recode_for_Analysis2Birthplace199Clinical_Ext_of_Tumor_New28Conversion_flag_I6Diagnostic_confirmation8First_malignant_prim_ind2Grade5Histologic_Type_II80ICD_Number6Laterality5Lymph_Node_Involvement_New10Marital_status_at_diagnosis6PS_Number9Race_ethnicity24Radiation10Radiation_sequence_with_surgery6Reason_no_surgery7Recode_I19SEER_historic_stage_A5Sequence_number7Site_specific_surgery_I25Tumor_Marker_I6Tumor_Marker_II6Alive2Table 4. Interval Variable Summary StatisticsVariableMeanStdDevMinMedianMaxAge_recodeless12.672.90941318Decade_at_Diagnosis55.9514.9891060100Decade_of_Birth919.4716.077187019201970Num_Nodes_Examined_New11.816.76801098Num_Pos_Nodes_New40.245.5210998Number_of_primaries1.210.464116Size_of_Tumor_New92.4230.7320309984.2 TheArtificial Neural NetworkFrom the results， figure (6) displays the iteration plot with Average Squared Error at eachiteration for the training and validation data sets. The estimation process required 10012iterations. The weights from theth98iteration were selected. Aroundth98iteration， theAverage Squared Error flattened out in the validation (the red line) data set， although itcontinued to drop in the training data set (the green line).Figure 6.Iteration plot with Average SquaredErrorFigure 7. Score Rankings Overlay： Alive (Gain Chart)As we knew the objective function is the Average Error. The best model is the model thatgives the smallest average error for the validation data. The following table shows somestatisticslabel， both targets are range normalized. Values are between 0 and 1. The root meansquare error for Target 1 is about 43.5%， mean square error is 18.9%. The following tableshows that：Table 5. Fitted StatisticsTARGETFitstatisticsStatistics LabelTrainValidationTestAlive_DFT_Total Degrees of Freedom.3016700Alive_DFE_Degrees of Freedom forError.2983100Alive_DFM_Model Degrees of Freedom.33600Alive_NW_Number of EstimatedWeights.33600Alive_AIC_Akaike's InformationCriterion.33753.8500Alive_SBC_Schwarz's BayesianCriterion.36547.5200Alive_ASE_Average Squared Error.0.1872010.18684830.187468Alive_MAX_Maximum Absolute Error.0.987510.99525050.9907213255Alive_DIV_Divisor for ASE.603344519045048Alive_NOBS_Sum of Frequencies.301672259522524Alive_RASE_Root Average Squared Error.0.4326670.432259530.432976Alive_SSE_Sum of Squared Errors.11294.588443.674738445.057Alive_SUMW_Sum of Case Weights TimesFreq.603344519045048Alive_FPE_Final Prediction Error.0.191418NaNNaNAlive_MSE_Mean Squared Error.0.189310.18684830.187468Alive_RFPE_Root Final Prediction Error.0.437513NaNNaNAlive_RMSE_Root Mean Squared Error.0.4350970.432259530.432976Alive_AVERR_Average Error Function.0.5483120.548766680.550722Alive_ERR_Error Function.33081.8524798.766224808.94Alive_MISC_Misclassification Rate.0.300660.291613190.293598Alive_WRONG_Number of WrongClassifications.9070658966134.3 The Decision TreesThe decision trees technique repetitionseparated observations in branches to make a treefor the purpose of evolving the prediction accuracy. By using mathematical algorithms (Giniindex， information gain， and Chi-square test) to identify a variable and correspondingthreshold for the variable that divides the input values into two or more subgroups. This stepis repetition at each leaf node until the complete tree is created (Neville， 1999).The aim of the dividing algorithmis to identify a variable-threshold pair that maximizes thehomogeneity of the two results or more subgroups of samples. The most mathematicalalgorithm used for splitting contains Entropy based information gain (used in C4.5， ID3， C5)，Gini index (used inCART)， and the Chi-squared test (used in CHAID).We have used the Entropy technique and summarize the results according to the mostcommon variables to choose the most and important predictor variables. In appendix (4)， theDecision Tree property criterion is Entropy， one of the results example are： ifSite_specific_surgery_I= 09 and SEER_historic_stage_A = 4 andLymph_Node_Involvement_New = 0 and Clinical_Ext_of_Tumor_New = 0 then node： 140，N (number of values in the node)： 1518， not survived (0) ： 94.8%， survived (1)： 5.2%， or ifthe Decision Tree property criterion is Gini， one of the example is; IFSite_specific_surgery_I = 90 and SEER_historic_stage_A = 4 ANDLymph_Node_Involvement_New=0 and Clinical_Ext_of_Tumor_New = 0 then node： 130，N： 1272， survived： 85.4% and not survived： 14.6%. and finallyif the Decision tree properitycriterion is ProbChisq， one of the exaplme is; Grade is one of： 9 or 2 and Sequence_numberis one of： 00， 02 or 03 and Reason_no_surgery is one of： 0 or 8 and SEER_historic_stage_A14= 4 then node： 76， for the number of thevalues is 2310， survived is 86.3% and not survivedis 13.7%.The most important variables participate for the largest numbers of the observations to thetarget variable if used Entopy are： Clinical_Ext_of_Tumor_New， Site_specific_surgery_I，Histologic_Type_I， Size_of_Tumor_New， Grade， Lymph_Node_Involvement_New，Sequence_number， SEER_historic_stage_A， Age_recodeless， Conversion_flag_I，Decade_of_Birth and Age_recodeless.We can say the most important variables to the target variables are： Grade， Size ofTumorNew， SEER historic stage A， Clinical Ext of Tumor New Lymph Node Involvement New，Histologic Type II， Sequence number Age recodeless， Decade of Birth and Conversion flag I.Table (6) view displays a list of variables in the order of their importancein the tree.15Table 6. The most important variables by using Entropy criterionThese results from the (Autonomous Decision Tree) icon when we used the interactiveproperty， the table shows that the prognosis factor ‘‘SEER historic stage A’’ is by far themost important predictor， which is not consistent with the previous research， the previousresearch was the prognosis factor “Grade” the most important predictor and “Stage of cancer”secondly! But from our table we see the second most important factoris ‘‘Clinical Extensionof tumor new’’， then “Decade (Age) at diagnosis” and ‘‘Grade’’. But we noticed that the sizeof tumor in theeighth in the standings.164.4 The Logistic RegressionFirstly， let we start with the Logistic Regression figure：Figure 8. Bar Charts for Logistic RegressionThe intercept and the parameters in the regression model. Bar number 1 represents theintercept with value (-1.520597)， bar 2， the value of the parameter which represent thevariable (SEER historic stage A)with value (-1.378877)， the second bar is and so on.The following table shows the regression model explanation， and it’s very clear in this modelas the variable (SEER historic stage A) one of the most important variable to the targetvariable， the intercept of Alive=1 is equal to-1.5206 which means the amount of change forthe target variable (Alive=1)， the coefficient of the variable (SEER historic stage A) is-1.38which means the amount of change in this variable on the Alive by-1.38， also the t-test is tocalculate the significance of the independent variable with the target variable， t =-28.66means (SEER historic stage A= value 4) is insignificant because if we are compare it withlevel of statistical significance equal to-0.05 >-28.66， that means reject the null hypothesisand accepting the alternative hypothesis instead， and this depend to the hypothesis that wewant to test it， might be we want to use this hypothesis：0：0Hagainst0：1Hor1：0Hagainst1：1H.But this different if we choose another value of (SEER historic stage A= value 0) because thet value = + 9.31， at this stage the variable is significant to the target variable.17Table 7.Regression most important variablesVariableLevelEffectEffect LabelIntercept1InterceptIntercept：Alive=1SEER_historic_stage_A4SEER_historic_stage_A4SEER_historic_stage_A4IMP_Site_specific_surgery_I2IMP_Site_specific_surgery_I02ImputedSite_specific_surgery_I 02IMP_Site_specific_surgery_I0IMP_Site_specific_surgery_I00Imputed Site_specific_surgery_I 00IMP_Site_specific_surgery_I9IMP_Site_specific_surgery_I09Imputed Site_specific_surgery_I 09Tumor_Marker_I2Tumor_Marker_I2Tumor_Marker_I 2Grade3Grade3Grade 3Tumor_Marker_I8Tumor_Marker_I8Tumor_Marker_I 8Sequence_number0Sequence_number00Sequence_number 00Grade4Grade4Grade 4Tumor_Marker_I0Tumor_Marker_I0Tumor_Marker_I 0IMP_Site_specific_surgery_I40IMP_Site_specific_surgery_I40Imputed Site_specific_surgery_I 40SEER_historic_stage_A2SEER_historic_stage_A2SEER_historic_stage_A2IMP_Site_specific_surgery_I58IMP_Site_specific_surgery_I58Imputed Site_specific_surgery_I 58SEER_historic_stage_A0SEER_historic_stage_A0SEER_historic_stage_A0IMP_Site_specific_surgery_I20IMP_Site_specific_surgery_I20Imputed Site_specific_surgery_I 204.6 Model Comparison using SASThe model comparison node belongs to theassessment categoryin the SAS data miningprocess of sample， explore， modify， model， and assess (SEMMA). The model comparisonnode enables us to compare models and predictions from the modeling nodes using variouscriteria.A common criterion for all modeling and predictive tools is a comparison of the expectedsurvival or not survival to actual survival or not survival getting data from model results.The criterion enables us to make cross-model comparisons and assessments， independent ofall other factors (such as sample size， modeling node， and so on).When we train a modeling node， assessment statistics are computed on the train (andvalidation) data. The model comparison node calculates the same statistics for the test setwhen present. The node can also be used to modify the number of deciles and/or bins andrecomputed assessment statistics used in the score ranking and score distribution charts forthe train (and validation) data set(Intrator and Intrator 2001).In addition， it computes for binary targets the Gini， Kolmogorov-Smirnor and Bin-Best Two-Way Kolmogorov–Smirnov statistics and generates receiver operating characteristic (Roc)charts for all models using the train (validation and test) datasets.We have used the program to run the results of the accuracy， sensitivity and specificity，between the neural network， the decision trees and the logistic regression (stepwise，18backward and forward). The steps we will have to run， 1. We must run the model comparisonto get theevent classification tableas the following table：Table 8. Event classificationObsMODELFNTNFPTP1Step.Reg TRAI586716131322449452Step.Reg VALI436812174247035833Back.Reg TRAI662416490286541884Back.Reg VALI481512564208031365Forw.Reg TRAI662416490286541886Forw.Reg VALI481512564208031367Neural TR612416409294646888Neural VA437512430221435769Tree TRAI7469204773270490710Tree VALI55271549124853589And then we put the results table in the program number (10) by using SAS Code to get theconfusion matrix. The following table shows the results of the event classification and theconfusion matrix.Table 9. Confusion MatrixObsMODELFNTNFPTPAccuracySensitivitySpecificity1Step.RegTRAI586716131322449450.698640.457360.833432Step.RegVALI436812174247035830.697370.450640.831333Back.RegTRAI662416490286541880.685450.387350.851984Back.RegVALI481512564208031360.694840.394420.857965Forw.RegTRAI662416490286541880.685450.387350.851986Forw.RegVALI481512564208031360.694840.394420.857967NeuralTR612416409294646880.699340.433590.847798NeuralVA437512430221435760.708390.449750.848819TreeTRAI746920477327049070.702710.396490.862310TreeVALI552715491248535890.704270.39370.86176From the table has appeared that the Neural Network Model is the best model because theaccuracy of the model is 0.70839 and the error rate is： 1-0.70839 = 0.29161， for sensitivity is0.44975 and for specificity is 0.84881， these are for the validation data， and all the values forthis model are bigger than the other models. The second important model is the decision tree19with accuracy of 0.70427 with error rate 0.29573， sensitivity is 0.3937 and for specificity is0.86176 and the third important model is the logistic regression (stepwise regression) withaccuracy of 0.69737 with error rate 0.30263， for sensitivity is 0.45064 and for specificity is0.83133; these results are for the validation， and so on for the backward and forwardregression.Figure 9.Model Comparison ChartFigure 10. Score Rankings Overlay： Alive (Cumulative Lift)20Figure 11. Score Rankings Overlay： Alive (Lift)Figure 12. Score Rankings Overlay： Alive (Gain)The following table shows the results of the k-fold cross validation：Table 10. K-fold cross-validation resultsFirst FoldObsMODELFNTNFPTPAccuracySensitivitySpecificity1Tree TRAI656918498300744370.705450.403140.860172Tree VALI508413926224731260.699340.380760.861063Neural TR498813875256542680.706060.461110.843984Neural VA384510508191930120.70110.439260.845585Step.Reg.TRAI537413777266338820.687230.41940.838026Step.Reg.VALI401610383204428410.685750.414320.83552SecondFold1Tree4 TRA712718818269038760.698040.352270.874932Tree4 VAL532714030208529410.696020.355710.870623Neural4 TR539314242264640240.694390.427310.843324Neural4 VA407810469202930570.688940.428450.837655Step.Reg.TRAI623014742214631870.681580.338430.872936Step.Reg.VALI469810876162224370.678090.341560.87022Third Fold1Neural6 TR531614482259341580.70210.438890.848142Neural6 VA400910638196131930.69850.443350.84435213Step.Reg.TRAI588614709236635880.689180.378720.861434Step.Reg.VALI439710823177628050.688250.389480.859045Tree6
Epitope analysis of HLA-DR-restricted helper T-cell responses to Der p II, a major allergen molecule of Dermatophagoides pteronyssinus
T-cell of ， a major allergen of ， were analyzed by using T-cell clones. We tested 38 cloned T cells from two Japanese patients with ， and identified at least two (K33-T47 and I58-C73) as helper T-cell . The former was shown to be restricted by *1502， and the latter by *0405， both of which are typical Japanese HLA-DR alleles， suggesting that those T-cell might be important for the onset of in the Japanese population. We prepared 15 analog of the *1502-restricted 15-. Of those 15 residues， five (F35， ， A39， F41， and E42) were critical for the activity， and three residues (F35， A39， and E42) seemed to be included in anchor motifs for *1502. The was also recognized by *1502-positive healthy donors; however， only allergic T cells showed Th2 functions. Antigen-presenting cells of nonallergic donors were able to activate allergic T cells to express Th2 function. This seemed to suggest that antigen recognition of T cells， as well as additional unknown factors which promote Th2， rather than ， responses， might be important for the onset of .
正 对117名肝硬化患者进行随机对照研究，旨在观察穿刺术和利尿剂在治疗腹水中的作用。组I58人，采用穿刺术治疗(每天抽腹水4～6L)，并静注白蛋白。组59人，用呋塞米(40～240 mg/d)和螺内酯(200～400 mg/d)治疗。对大剂量利尿剂无反应者行腹腔静脉分流
Sir， Regardless of the eventual resting place of the Silurian-Devonian junction in South Wales and the Welsh Borders， whether it be accepted by everyone either at the base of the Ludlow Bone Bed or at the Downtonian-Dittonian junction， the recent contributions on the subject by Westoll et al. 1971 and Turner (1971) raise important questions regarding the positioning of the upper of the two possible system boundaries. Currently， the most favoured position for the Downtonian-Dittonian boundary is at the somewhat imprecise base of the 'Psammosteus' Limestone Group in the Brown Glee Hill area of Shropshire， which is marked by a fairly abrupt change from estuarine to fluviatile conditions， and is coincident with the present known incoming of the Traquairaspis fauna in that area. Westoll et al. (1971， p. 286)， aware of the doubtful value offish faunas in determining this junction， comment that it would not be unreasonable to suppose that Traquairaspis might be found below this position in the Welsh Borders， and indeed this is already the case， for， in Alteryn Quarry near Newport (Mon.)， Traquairaspis pococki has been found (Squirrell & Downing 1969， pp. 38 and 45) in 2 ft (0.8 m) of micaceous greyish green sandstone at 520 ft (I58 m) below the 'Psammosteus' Limestone. This is about 420 ft (128 m) below the base of the 'Psammosteus' Limestone Beds of Newport， which approximately equate with the 'Psammosteus' Limestone Group of the Brown Clee Hill area. This occurrence of Traquairaspis， much lower than the previously known first appearance ... This 250-word extract was created in the absence of an abstract.