电竞平台

当我们聊数据质量的时候,我们在聊些什么?

2021-02-05 11:38:16 dms@root 9

导读:

SUIZHEDASHUJUXINGYEDESHENRUFAZHAN,SHUJUZHILIANGYUELAIYUECHENGWEIYIGERAOBUKAIDEHUATI,NEIDANGDAJIAZAILIAOSHUJUZHILIANGDESHIHOU,TONGCHANGHUILIAOSHENMENI?CONGSHENMESHISHUJUZHILIANGKAISHI。

一、什么是数据质量

数据质量:一个评估规则维度提供一种测量与管理信息和数据的方式。

QUFENGUIZEWEIDUYOUZHUYU:

  • JIANGWEIDUYUYEWUXUQIUXIANGPIPEI,BINGQIEHUAFENPINGGUDEXIANHOUSHUNXU;

  • LEJIECONGMEIYIWEIDUDEPINGGUZHONGNENGGOU/BUNENGGOUDEDAOSHENME;

  • ZAISHIJIANHEZIYUANYOUXIANDEQINGKUANGXIA,GENGHAODIDINGYIHEGUANLIXIANGMUJIHUAZHONGDEXINGDONGSHUNXU。

数据质量检核主要分为以下规则维度:

  • WANZHENGXING(Completeness):YONGLAIMIAOSHUXINXIDEWANZHENGCHENGDU。

  • WEIYIXING(Uniqueness):YONGLAIMIAOSHUSHUJUSHIFOUCUNZAIZHONGFUJILU,MEIYOUSHITIDUOYUCHUXIANYICI。

  • YOUXIAOXING(Validity):YONGLAIMIAOSHUMOXINGHUOSHUJUSHIFOUMANZUYONGHUDINGYIDETIAOJIAN。TONGCHANGCONGMINGMING、SHUJULEIXING、ZHANGDU、ZHIYU、QUZHIFANWEI、NEIRONGGUIFANDENGFANGMIANJINXINGYUESHU。

  • YIZHIXING(Consistency):YONGLAIMIAOSHUTONGYIXINXIZHUTIZAIBUTONGDESHUJUJIZHONGXINXISHUXINGSHIFOUXIANGTONG,GESHITI、SHUXINGSHIFOUFUHEYIZHIXINGYUESHUGUANXI。

  • ZHUNQUEXING(Accuracy):YONGLAIMIAOSHUSHUJUSHIFOUYUQIDUIYINGDEKEGUANSHITIDETEZHENGXIANGYIZHI(XUYAOYIGEQUEDINGDEHEKEFANGWENDEQUANWEICANKAOYUAN)。

  • JISHIXING(Timeless):YONGLAIMIAOSHUCONGYEWUFASHENGDAODUIYINGSHUJUZHENGQUECUNCHUBINGKEZHENGCHANGCHAKANDESHIJIANJIANGECHENGDU,YEJIAOSHUJUDEYANSHISHIZHANG,SHUJUZAIJISHIXINGSHANGYINGNENGJINKENENGTIEHEYEWUSHIJIFASHENGSHIDIAN。

  • KEXINXING(credibility):YONGLAIMIAOSHUSHUJUFASHENGSHIFOUFUHEKEGUANGUILV。

MEIYIGUIZEWEIDUKENENGXUYAOBUTONGDEDULIANGFANGFA、SHIJIHELIUCHENG。ZHEIJIUDAOZHILEWANCHENGJIANHEPINGGUSUOXUYAODESHIJIAN、JINQIANHERENLIZIYUANHUICHENGXIANCHUCHAYI。SHUJUZHILIANGDETISHENGBUSHIYICUERJIUDE,ZAIQINGCHULEJIEPINGGUMEIYIWEIDUSUOXUGONGZUODEQINGKUANGXIA,XUANZENEIXIEDANGQIANJIAOWEIPOQIEDEJIANHEWEIDUHEGUIZE,CONGYIDAONAN、YOUQIANRUSHENDIZHUBUTUIDONGSHUJUZHILIANGDEQUANMIANGUANLIYUTISHENG。GUIZEWEIDUDECHUBUPINGGUJIEGUOSHIQUEDINGJIXIAN,QIYUPINGGUZEZUOWEIJIXUJIANCEHEXINXIGAIJINDEYIBUFEN,ZUOWEIYEWUCAOZUOLIUCHENGDEYIBUFEN。

二、数据完整性

数据完整性维度大类下可细分为以下维度小类:

  • 非空约束:描述检核对象是否存在数据值为空的情况。如客户开户时,客户名称是必填项,不能出现为空的情况。

FEIKONGYUESHUBIJIAORONGYILIJIE,JIANDANDEJIANGJIUSHIZIDUANBUNENGWEIKONG,JIANCHAFANGSHIYEBIJIAORONGYI,ZHIXUYAOSHEDINGXUYAOJIANCHADEZIDUAN,TONGGUO sql CHAXUNLIEZHIBUNENGWEIKONGJIKE。JIANGWEIKONGDESHUJUCHAXUNCHULAIJINXINGZHENGGAI。

DANGRANFEIKONGYUESHUKEYITONGGUOSHEZHIFEIKONGYUESHUDEFANGSHIXIANZHISHUJUWUFAXIERUSHUJUKU,RUGUOZHICHIZHEIZHONGFANGSHIKEYIBIMIANSHIHOUDESHUJUFEIKONGJIANCHA。

三、数据唯一性

数据唯一性维度大类下可细分为以下维度小类:

  • WEIYIXINGYUESHU:MIAOSHUTONGYIKEGUANSHITIZAIBUTONGYEWUSHUJUJIZHONGDEXINXI,JINGZHENGHEHOUSHIWEIYIDE,ZHENDUIMUBIAOTONGCHANGSHIDANYIZHUJIANHUOLIANHEZHUJIAN,RUZHENGJIANLEIXING+ZHENGJIANHAOMA+XINGMINGXIANGTONG,ZEQIKEHUBIANHAOYINGWEIYI。

JUGEJIANDANDELIZI,WEIYIXINGYUESHUZAIJISHUSHANGYIBANJUBEIWEIYIDEBIAOSHIZIDUANKEYIPANDUANQIWEIYIXING,ZAIYEWUSHANGKEYITONGGUOJIGEGUANLIANDEYEWUSHUXINGDUIQUEDINGWEIYIYEWUSHITI。RUOZAIZHEIZHONGQINGKUANGCHUXIANSHUJUZHONGFUDEWENTI,JIWEIFANLEWEIYIXINGYUESHU。ZHEIZHONGQINGKUANGRUGUOSHIDANYIDEYEWUZHUJIAN,KEYITONGGUODUIZHUJIANFENZUQUZHONGDEFANGSHIJIANCHA,RUGUOSHIYEWULIANHESHUXINGPANDUANWEIYISHITIDEQINGKUANG,ZHINENGYEWURENYUANJINXINGSHOUDONGJIANCHA。

四、数据有效性

数据有效性维度大类下可细分为以下维度小类:

  • DAIMAZHIYUYUESHU:MIAOSHUJIANHEDUIXIANGDEDAIMAZHISHIFOUZAIDUIYINGDEDAIMABIAONEI。RUYEWUGUIZEDINGYI“XINGBIE”DEQUZHIYINGGAISHI“1-WEIZHIDEXINGBIE”、“2-NANXING”、“3-NVXING”、“4-WEISHUOMINGDEXINGBIE”,RUGUOCHUXIAN“A”、“B”ZHEIYANGDEQUZHI,ZERENWEI“XINGBIE”DEDAIMAZHIYUCUNZAIWENTI;

  • ZHANGDUYUESHU:MIAOSHUJIANHEDUIXIANGDEZHANGDUSHIFOUMANZUZHANGDUYUESHU。RU“JINRONGJIGOUBIANMA”ZAI《RENMINYINXINGJINRONGJIGOUBIANMAGUIFAN》ZHONGGUIDINGZHANGDUWEI14WEI,RUGUOCHUXIANFEI14WEIDEZHI,ZEPANDINGWEIBUMANZUZHANGDUYUESHU,BUSHIYIGEYOUXIAODE“JINRONGJIGOUBIANMA”;

  • NEIRONGGUIFANYUESHU:MIAOSHUJIANHEDUIXIANGDEZHISHIFOUANZHAOYIDINGDEYAOQIUHEGUIFANJINXINGSHUJUDELURUYUCUNCHU。RU“CUNKUANZHANGHAO”YINGJINHANSHUZI,RUGUOCHUXIANZIMUHUOQITAFEIFAZIFU,ZEBUSHIYIGEYOUXIAODE“CUNKUANZHANGHAO”,BUMANZUNEIRONGGUIFANYUESHU;

  • QUZHIFANWEIYUESHU:MIAOSHUJIANHEDUIXIANGDEQUZHISHIFOUZAIYUDINGYIDEFANWEINEI。RU“SHOUXINEDU”QUZHIFANWEIYINGDAYUDENGYU 0,RUGUOCHUXIANXIAOYU 0 DEQINGKUANG,ZECHAOCHULEQUZHIFANWEIDEYUESHU,BUSHIYIGEYOUXIAODE“SHOUXINEDU”。

代码值域约束

MIAOSHUJIANHEDUIXIANGDEZHISHIFOUANZHAOYIDINGDEYAOQIUHEGUIFANJINXINGSHUJUDELURUYUCUNCHU。

LI 1 : YIYEWUGUIZEXINGBIEZHIYOU “0:NAN” ,”1:NV”,ZEXINGBIEZIDUANZHIYINGCHUXIAN0HUO1。

LI 2 : HUOBIDAIMA (CURCODE) ZHIYINGYOURMBHUOSHIUSDZHI。

SHUJUZHILIANGZHONGDAIMAZHIYUSHOUXIANYAOZHIDINGQIYEJIDETONGYIBIANMABIAO,RANHOUANZHAODUIZHAOGUANXIJINXING etl ZHUANHUAN,ZHIYUCHUBAOGAOZHIXUYAOTONGGUO sql CHAXUNBUZAIFANWEINEIDESHUZHIJIUKEYILE。

长度约束

MIAOSHUJIANHEDUIXIANGDEZHANGDUSHIFOUMANZUZHANGDUYUESHU。

LIRUSHENFENZHENGHAOSHI 18 WEI。

ZHANGDUYUESHUKEYITONGGUOJIANBIAOSHIZHIDINGZIFUZHANGDUQUXIANZHI,RUGUOYEWUXITONGZUICHUMEIYOUZUOXIANZHI,ZHINENGTONGGUO sql PANDUANZHANGDUDEFANGSHIHUOQUYICHANGZHIZAIJINXINGCHULI。

内容规范约束

描述检核对象的值是否按照一定的要求和规范进行数据的录入与存储。

例如:余额或者日期等一般都会按照固定类型存储,如果最初设计为字符型后续应按照对应类型调整。
首先这种情况最好一开始就建立好统一规范,按照业务含义去指定技术类型。如果最初做的不好,可以通过类型进行数据探查,对数据统一格式化。

取值范围约束

描述检核对象的取值是否在预定义的范围内。

例如:余额不能为负数,日期不能为负数等等。
如果业务初始没有做限制,只能通过 sql 去对数据过滤查询,对有问题数据集中 etl 处理。

五、数据一致性

数据一致性维度大类下可细分为以下维度小类:

  • DENGZHIYIZHIXINGYILAIYUESHU:MIAOSHUJIANHEDUIXIANGZHIJIANSHUJUQUZHIDEYUESHUGUIZE。YIGEJIANHEDUIXIANGSHUJUQUZHIBIXUYULINGYIGEHUODUOGEJIANHEDUIXIANGZAIYIDINGGUIZEXIAXIANGDENG。

  • CUNZAIYIZHIXINGYILAIYUESHU:MIAOSHUJIANHEDUIXIANGZHIJIANSHUJUZHICUNZAIGUANXIDEYUESHUGUIZE。YIGEJIANHEDUIXIANGDESHUJUZHIBIXUZAILINGYIGEJIANHEDUIXIANGMANZUMOUYITIAOJIANSHICUNZAI。

  • LUOJIYIZHIXINGYILAIYUESHU:MIAOSHUJIANHEDUIXIANGZHIJIANSHUJUZHILUOJIGUANXIDEYUESHUGUIZE。YIGEJIANHEDUIXIANGSHANGDESHUJUZHIBIXUYULINGYIGEJIANHEDUIXIANGDESHUJUZHIMANZUMOUZHONGLUOJIGUANXI(RUDAYU、XIAOYUDENG)。

等值一致性依赖约束

YIBANZHIWAIJIANGUANLIANDECHANGJING。LIRU:BAODANBIAO,LIPEIBIAODEBAODANHAOCUNZAIBAODANZHUBIAO,TONGYIZHANGBIAO,LIANGGEZIDUANZHIJIANDEGUANLIANGUANXI。

存在一致性依赖约束

主要是强调业务的关联性,一个状态发生了则某个值一定会如何。
例如:投保状态为已投保,则投保日期不应为空;

逻辑一致性依赖约束

ZHUYAOQIANGDIAODESHIZIDUANJIANDEHUXIANGYUESHUGUANXI。

LIRU:TOUBAOKAISHISHIJIANXIAOYUDENGYUTOUBAOJIESHUSHIJIAN。

六、数据准确性

数据准确性主要是指取值的准确性,描述该检核对象是否与其对应的客观实体的特征相一致。

LIRU:TOUBAORENDEXINGBIEDAIMAWEI0-NVXING,SUIRANMANZUDAIMAZHIYUYUESHU,DANQUEBUMANZUQUZHIZHUNQUEXINGYUESHU,YINWEIGAIRENWEINANXING,QIXINGBIEDAIMAYINGWEI1-NANXING;

ZAIRU:GUOJIBAOHANYEWUDESHOUXUFEIYINGLURUWEIGUOJIDANBAOSHOUXUFEISHOURU,QUELURUCHENGGUONEIDANBAOSHOUXUFEISHOURU。

准确性要求不仅数据的取值范围和内容规范满足有效性的要求,其值也是客观真实世界的数据。由此可见,有效的数据未必是准确的,反之成立。
准确性通常需要业务人员或其他当事人手工核查。

DUIDAIZHEIZHONGQINGKUANG,SHUJUZHILIANGGUIZEMEIBANFAZHIJIETONGYICHULI,ZHINENGTONGGUOJISHICHAXUNDEFANGSHIDUISHUJUJIEGUOJINXINGXIANGXIHECHA。

七、数据及时性

及时性约束:描述检核数据能否及时反映其对应的实际业务的时点状态。

LIRU:XITONGZHONGDAIKUANWUJIFENLEIDEFENLEIBISHIJIZHONGDEYANCHIJITIANBIANHUA;ZAIRULICAIYEWUZAILICAIXITONGZHONGSHICHENGGONGZHUANGTAI,DANZAIHEXINXITONGZHONGQUEYINTONGXINDEYUANYINERMEIYOURUZHANG。

JISHIXINGYOUYUDUOGEXITONG、TONGXINDENGYUANYINERZAOCHENG,TONGCHANGXUYAOYEWURENYUANHUOXITONGRENYUANSHOUGONGHECHA。

YIBANLAISHUOSHUJUTONGBUDOUSHIJIYUYEWUXITONGDELUOBIAOJISHUZIDUAN(BIRU:CREATE_DT),ERZHENSHIYEWUFASHENGDESHIJIANKENENGYUGAIZIDUANCUNZAISHIJIANJIANGE。KEYITONGGUOJIANDANDEsqlDUILIANGGESHIJIANBIJIAO,PANDUANSHUJUDEJISHIXINGSHIFOUFUHEXUQIU。

八、数据可信性

数据可信性约束:描述在数据同步中每日/月增量数据是否符合理论的经验值。

LIRU:BAODANSHUJUDEMEIRIFENQUSHUJUJIAOQIANRIYIBANYOU 10% ZENGZHANG,TURANSHUJUZENGZHANGBIANWEI200%,ZHEIZHONGQINGKUANGYOUKENENGSHISHUJUTONGBUCHUXIANWENTI。

ZAIRU:MEIYUEDEYINGSHOUZONGEYIBANDOUANYIDINGGUILVSHANGZHANG,TURANSHUJUBODONGJIAODAZEYIBANDOUKENENGCHUXIANWENTI。

KEXINXINGYAOQIUSHUJUDEZONGLIANGBODONGFUHEJIBENKEGUANGUILV,YIBANTONGGUODUI 7、15、30 RISHUJUJINXINGBIJIAO,RUGUOCHUXIANCHAJUJIAODAZEJINXINGXIANGXIDEWENTITANCHA。

GUANYUDAMEISHENG:

BEIJINGDAMEISHENGRUANJIANGUFENYOUXIANGONGSI(YIXIAJIANCHENG“DAMEISHENG”,GUPIAODAIMA:430311)SHIYIJIAKUAPINGTAIZICHANQUANSHOUQISHUJUGUANLI(ALIM)PINGTAITIGONGSHANG,ZHILIYUTONGGUOZIZHUKESHIHUA、QINGLIANGHUAHEXINJISHU,JIYUGONGCHENGHEYUNWEIYITIHUASHUJU,WEIKEHUGOUJIAN“SHUZILUANSHENG(Digital Twin)”,DAZAOQUANSHOUQIZICHANGUANLIYUJIAZHITISHENGJIEJUEFANGAN。

DAMEISHENGTONGGUOSHIDUONIANJISHUDAMOHEJINGYANLEIJI,CHUANGLI“SANWEIYITI”CHANPINHEFUWU:

“1”GEYINQING:KUAPINGTAIKESHIHUAYINQINGeZWalker;

“2”GEPINGTAI:GONGCHANGKESHIHUADASHUJUGUANLIPINGTAIPIMCenter/eZWalker TeslaKESHIHUAKAIFAPINGTAI;

“3”GEYINGYONG:GONGCHENGXIANGMUGUANLIXITONG PIMCenter PPM/YUNXIETONGJIYIJIAOXITONG PIMCenter HO /ZICHANGUANLIXITONG PIMCenter APM。

达美盛为国家高新技术企业、中关村高新技术企业、《工程建设标准化》理事会理事单位, ISO15926国内首家发起成员,CFIHOS组织成员单位,并通过了ISO9001、ISO14001、OHSAS18001等质量体系认证;荣获2016年度最佳数字化工厂服务商、2016年度“基础设 施可视化资产管理最佳应用奖”;其全资子公司四川达美盛工程设计有限公司具备石油天然气设计资质。


标签:  数据质量 数据完整性 数据 数据有效性 数据一致性