ÐÂÖÇÔª±¨µÀ ±à¼£ºÏ¬Å£ ¡¾ÐÂÖÇÔªµ¼¶Á¡¿À´×ÔӢΰ´ïºÍUIUCµÄ»ªÈËÍŶÓÌá³öÒ»ÖÖ¸ßЧѵÁ·ÒªÁ죬½«LLMÉÏÏÂÎij¤¶È´Ó128KÀ©Õ¹ÖÁ¾ªÈ˵Ä400Íòtoken SOTA¼Í¼£¡»ùÓÚLlama3.1-Instruct´òÔìµÄUltraLong-8BÄ£×Ó£¬²»µ«ÔÚ³¤ÉÏÏÂÎÄ»ù×¼²âÊÔÖÐÌåÏÖ׿Խ£¬»¹ÔÚ±ê׼ʹÃüÖмá³Ö¶¥¼â¾ºÕùÁ¦¡£ ´óÓïÑÔÄ£×Ó£¨LLM£©ÔÚÎı¾ºÍ¶àģ̬ʹÃüÉÏÒѾչÏÖ³ö¾ªÑÞµÄÌåÏÖ¡£ ÏñÊÇ×îеÄGemini 2.5 ProÔÚÎı¾¼°´úÂëÉϵĶ¥¼âÐÔÄÜ£¬ÒÔ¼°GPT-4oµÄÔÉúÉúͼÄÜÁ¦¶¼ºÜºÃµÄ֤ʵÎúÕâµã¡£ È»¶ø£¬Ðí¶àÏÖʵӦÓó¡¾°£¬ºÃ±È³¤ÎĵµºÍÊÓÆµÃ÷È·¡¢ÉÏÏÂÎÄѧϰÒÔ¼°ÍÆÀíʱÀ©Õ¹£¬¶¼ÐèҪģ×ÓÄܹ»´¦Öóͷ£³¬³¤µÄtokenÐòÁС£ ÔÚÕâЩ³¡¾°ÖУ¬Ä£×ÓµÄÉÏÏÂÎÄ´°¿ÚÊÜÏÞÍùÍù³ÉΪһ´óÆ¿¾±£¬ÓÉÓÚÂþÑÜÔÚ³¤ÎĵµÖеÄÒªº¦ÐÅÏ¢¿ÉÄܻᱻºöÂÔ¡£ ΪÏàʶ¾öÕâЩÎÊÌ⣬À´×ÔӢΰ´ïºÍUIUCµÄÑо¿ÕßÌá³öÁËÒ»ÖÖ¸ßЧµÄѵÁ·ÒªÁì¡£ ÕâÖÖÒªÁì¿ÉÒÔ´ÓÏÖÓеÄÖ¸Áî΢µ÷Ä£×Ó³ö·¢£¬¹¹½¨³¬³¤ÉÏÏÂÎĵÄLLM£¬×î¸ß¿É½«ÉÏÏÂÎij¤¶ÈÍÆÏò400ÍòtokenµÄ¼«ÏÞ£¡ ÂÛÎĵص㣺https://arxiv.org/pdf/2504.06214 Ñо¿Ö°Ô±Ê¹ÓÃÉÏÃæÒªÁìѵÁ·µÄUltraLong-8BÄ£×ÓÔÚ³¤ÉÏÏÂÎÄʹÃüÉϵִïÁ˶¥¼âˮƽ£¬Í¬Ê±ÔÚ±ê׼ʹÃüÉÏÒ²¼á³ÖÁ˾ºÕùÁ¦¡£ Ö÷ҪТ˳£º ¸ßЧÇÒ¿ÉÀ©Õ¹µÄѵÁ·ÒªÁì¡£ Òªº¦ÊÖÒÕÁ¢Ò죺Ñо¿ÕßÒýÈëÁËÌØÊâÎĵµÍÑÀë·ûºÍ»ùÓÚYaRNµÄλÖñàÂëÀ©Õ¹ÊÖÒÕ£¬Í¨¹ýÏûÈÚʵÑé֤ʵÕâЩÊÖÒÕ¶Ô³¤ÉÏÏÂÎĽ¨Ä£ÖÁ¹ØÖ÷Òª¡£ ¸ßЧµÄµ¥²½Ô¤ÑµÁ·Õ½ÂÔ£ºÑо¿Õß·¢Ã÷£¬Ïà±È¶à²½À©Õ¹ÒªÁ죬µ¥²½Ò»Á¬Ô¤ÑµÁ·ÔÚÉÏÏÂÎÄÀ©Õ¹Éϸü¸ßЧ£¬ÔںϳɺÍÕæÊµÌìϳ¤ÉÏÏÂÎÄ»ù×¼²âÊÔÖÐʼÖÕÌåÏÖ¾«²Ê¡£ ÖÜÈ«µÄʵÑéÑéÖ¤£ºÑо¿ÕßÔÚ¶à¸ö»ù×¼²âÊÔÉϾÙÐÐÁËÆÕ±éʵÑ飬°üÀ¨RULER¡¢LV-Eval¡¢InfiniteBench¡¢MMLU¡¢MMLU-Pro¡¢MATH¡¢GSM-8KºÍHumanEval£¬Ö¤ÊµUltraLong-8BÄ£×ÓÔÚ³¤ÉÏÏÂÎĺͱê׼ʹÃüÉϾùÓÅÓÚÏÖÓлùÏß¡£ ʵÑéÒªÁì Èçͼ1Ëùʾ£¬±¾ÎÄÒªÁìÖ÷Òª·ÖΪÁ½¸ö½×¶Î£ºÒ»Á¬Ô¤ÑµÁ·ºÍÖ¸Áî΢µ÷¡£ ÒÔLlama 3.1-8B-InstructΪ»ù´¡£¬Ò»Á¬Ô¤ÑµÁ·½×¶Î½«Ä£×ÓµÄÉÏÏÂÎÄ´°¿ÚÖð²½À©Õ¹µ½Ä¿µÄ³¤¶È£¨ºÃ±È100Íò¡¢200Íò¡¢400Íòtoken£©¡£Ëæºó£¬Ö¸Áî΢µ÷½×¶ÎÓÅ»¯Ä£×ÓµÄÖ¸Áî×ñÕÕÄÜÁ¦ºÍÍÆÀíÄÜÁ¦¡£ ÕâÁ½¸ö½×¶ÎÁ¬Ïµ£¬ÈÃÄ£×Ó¼ÈÄܸßЧ´¦Öóͷ£³¬³¤ÊäÈ룬ÓÖÄÜÔÚÊÇ·ÇÉÏÏÂÎÄʹÃüÖÐÌåÏÖ¾«²Ê¡£ µÚÒ»½×¶Îͨ¹ýÒ»Á¬Ô¤ÑµÁ·À©Õ¹Ä£×ÓµÄÉÏÏÂÎÄ´°¿Ú£¬½ÓÄÉÌØÊâÎĵµÍÑÀë·ûºÍ»ùÓÚYaRNµÄËõËÉÊÖÒÕÀ´´¦Öóͷ£³¬³¤ÐòÁС£µÚ¶þ½×¶ÎʹÓÃÈ«ÐÄÌôÑ¡µÄÊý¾Ý¼¯¾ÙÐÐÖ¸Áî΢µ÷£¬ÌáÉýÄ£×ÓµÄÖ¸Áî×ñÕÕºÍÍÆÀíÄÜÁ¦ Ò»Á¬Ô¤ÑµÁ·£ºÀ©Õ¹ÉÏÏÂÎij¤¶È ÔÚµÚÒ»½×¶Î£¬Ñо¿Õßͨ¹ýÒ»Á¬Ô¤ÑµÁ·½«Llama-3.1-8B-InstructµÄÉÏÏÂÎÄ´°¿ÚÀ©Õ¹µ½Ä¿µÄ³¤¶È¡£ Ñо¿Õß¶ÔÉÙÓÚ4000 tokenµÄËæ±Êµµ¾ÙÐÐϲÉÑù£¬¶ÔÁè¼Ý8000 tokenµÄ³¤Îĵµ¾ÙÐÐÉϲÉÑù£¬×îÖÕÐγÉÒ»¸ö°üÀ¨10ÒÚtokenµÄÓïÁϿ⡣ ÕâЩÎĵµ±»Æ´½Ó³É¶ÔӦĿµÄÉÏÏÂÎij¤¶ÈµÄ¸ü³¤ÐòÁУ¨ºÃ±È100Íò¡¢200Íò¡¢400Íòtoken£©¡£Æ´½Óʱ£¬ËûÃÇʹÓÃÌØÊâ×Ö·ûÍÑÀë²î±ðÎĵµ£¬¶ø²»ÊÇÓñ£´æµÄ×îÏȺͿ¢Ê±ê¼Ç¡£ ±ðµÄ£¬ÔÚÒ»Á¬Ô¤ÑµÁ·ÖУ¬Ñо¿Ö°Ô±Ã»ÓÐʹÓÿçÎĵµ×¢ÖØÁ¦ÑÚÂ룬´Ó¶øÔÊÐíÄ£×Ó¹Ø×¢Õû¸öÊäÈëÐòÁС£ ΪÁËÖ§³Ö³¬³¤ÉÏÏÂÎÄ£¬Ñо¿Ö°Ô±½ÓÄÉÁË»ùÓÚYaRNµÄËõ·ÅÒªÁ죬¶ø²»ÊÇ֮ǰÊÂÇéÖг£ÓõÄNTK¸ÐÖªËõ·ÅÕ½ÂÔ¡£ËûÃÇÀο¿³¬²ÎÊý¦Á=1ºÍ¦Â=4£¬²¢Æ¾Ö¤Ä¿µÄÉÏÏÂÎij¤¶ÈÅÌËãËõ·ÅÒò×Ós¡£ µ±ÊäÈ볤¶È¿¿½ü×î´óÏÞÖÆÊ±£¬Llama-3.1Ä£×ÓµÄÐÔÄÜ»áϽµ¡£Îª½â¾öÕâ¸öÎÊÌ⣬ËûÃÇΪRoPEǶÈë½ÓÄÉÁׯü´óµÄËõ·ÅÒò×Ó£¬´Ó¶ø¸üºÃµØË³Ó¦³¬³¤ÐòÁС£ Ñо¿ÕßÕë¶ÔÈýÖÖÉÏÏÂÎij¤¶È£¨100Íò¡¢200ÍòºÍ400Íòtoken£©¹¹½¨Á˳¤ÉÏÏÂÎÄÄ£×Ó£¬²¢½«RoPEËõ·ÅÒò×Ó»®·ÖÉèÖÃΪ128¡¢256ºÍ512¡£ ÿ¸öÄ£×ÓÔÚ10ÒÚtokenµÄÓïÁÏÉÏѵÁ·Ò»¸öepoch£¬Ñ§Ï°ÂÊΪ3¡Á10??¡£ ΪÁËÌáÉýѵÁ·µÄ¿ÉÀ©Õ¹ÐÔ£¬ËûÃÇʹÓÃÁËMegatron-LM¿ò¼Ü¡£ÎªÁË´¦Öóͷ£³¬³¤ÊäÈëÐòÁУ¬½ÓÄÉÁËÕÅÁ¿²¢ÐкÍÉÏÏÂÎIJ¢ÐС£ ѵÁ·ÔÚ256¸öNVIDIA H100 GPUÉϾÙÐУ¬1M¡¢2MºÍ4MÄ£×ÓµÄѵÁ·Ê±¼ä»®·ÖԼΪ5Сʱ¡¢6СʱºÍ13Сʱ¡£ Ö¸Áî΢µ÷ ÔÚµÚ¶þ½×¶Î£¬Ñо¿Õßͨ¹ý¼àÊÓ΢µ÷£¨SFT£©ÌáÉý³¤ÉÏÏÂÎÄÄ£×ÓµÄÖ¸Áî×ñÕÕºÍÍÆÀíÄÜÁ¦£¬Ê¹ÓõÄÊÇһЩȫÐÄÌôÑ¡µÄÊý¾Ý¼¯¡£ ËûÃÇÕûºÏ²¢ÓÅ»¯Á˶à¸ö¿ªÔ´SFTÊý¾Ý¼¯£¬ÁýÕÖÈý¸öÒªº¦ÁìÓò£ºÍ¨ÓÃÁìÓò¡¢ÊýѧºÍ´úÂë¡£ ΪÁ˽øÒ»²½ÌáÉýSFTÊý¾Ý¼¯µÄÖÊÁ¿£¬ËûÃÇʹÓÃGPT-4oºÍ4o-miniÓÅ»¯ÁËÕâЩÊý¾Ý¼¯µÄ»Ø¸²ÄÚÈÝ¡£ ÖµµÃ×¢ÖØµÄÊÇ£¬Ñо¿ÕßµÄSFTÊý¾Ý¼¯½ö°üÀ¨ÉÏÊö¶ÌÉÏÏÂÎÄÊý¾Ý£¨ÉÙÓÚ8000 tokenµÄÑù±¾£©£¬Ã»ÓмÓÈëºÏÉú³¤ÉÏÏÂÎÄÖ¸ÁîÊý¾Ý¡£ ËûÃÇ·¢Ã÷£¬½öÒÀÀµ¶ÌÉÏÏÂÎÄÊý¾Ý¾Í×ãÒÔÈ¡µÃÓÅÒìЧ¹û£¬ÕâÓë֮ǰÑо¿µÄÊÓ²ìÒ»Ö¡£ ×îÖÕ£¬Ñо¿Õß¹¹½¨ÁËÒ»¸ö°üÀ¨10Íò¸öÑù±¾µÄSFTÊý¾Ý¼¯¡£¹ØÓÚÿÖÖÄ¿µÄÉÏÏÂÎij¤¶ÈµÄÄ£×Ó£¬ËûÃÇʹÓÃ128µÄÅú¾ÞϸºÍ5¡Á10??µÄѧϰÂÊ¡£ ѵÁ·ÒÀÈ»»ùÓÚMegatron-LM ¿ò¼Ü£¬ÔÚ256¸öNVIDIA H100 GPUÉϾÙÐУ¬ÕÅÁ¿²¢ÐжÈÉèΪtp=8¡£Ã¿´ÎѵÁ·Ô¼ÄªÐèÒª 30 ·ÖÖÓ¡£ »ùÏßÄ£×ÓÓëÆÀ¹À»ù×¼ Ñо¿Õß½«ËûÃǵÄÄ£×ÓÓë»ùÓÚLlama¼Ò×åµÄ×îÏȽø£¨SOTA£©³¤ÉÏÏÂÎÄÄ£×Ó¾ÙÐбÈÕÕ£¬ÒÔÈ·±£¶ÔѵÁ·ÒªÁ칫ÕýÇÒ¿É¿ØµÄÆÀ¹À¡£ Llama-3.1 (Llama-3.1-8B-Instruct)£ºÕâÊÇËûÃǵĻù´¡Ä£×Ó£¬Ö§³Ö128KµÄÉÏÏÂÎÄ´°¿Ú¡£ ProLong (Llama-3-8B-ProLong-512k-Instruct)£º»ùÓÚLlama-3¹¹½¨µÄ³¤ÉÏÏÂÎÄÄ£×Ó£¬ÓµÓÐ512KµÄÉÏÏÂÎÄ´°¿Ú¡£ Gradient (Llama-3-8B-Instruct-Gradient-1048k)£ºÁíÒ»¸ö»ùÓÚLlamaµÄ³¤ÉÏÏÂÎÄÄ£×Ó£¬Ö§³Ö¸ß´ï1MµÄÉÏÏÂÎÄ´°¿Ú¡£ ±¾ÎÄÑо¿ÕßרעÓÚLlama¼Ò×åµÄÄ£×Ó£¬ÕâÑù¿ÉÒÔ¸üÇåÎúµØÕ¹Ê¾ËûÃÇÀ©Õ¹ÉÏÏÂÎij¤¶ÈѵÁ·ÒªÁìµÄÓÐÓÃÐÔ£¬Í¬Ê±È·±£ÔÚ±ê׼ʹÃüÉϵÄÐÔÄÜÒÀÈ»¾ßÓоºÕùÁ¦¡£ ËûÃÇͨ¹ýÒÔÏ»ù×¼²âÊÔÀ´ÆÀ¹ÀÄ£×ӵij¤ÉÏÏÂÎÄÄÜÁ¦£º RULER£ºÕâÊÇÒ»¸öרÃÅÆÀ¹À³¤ÉÏÏÂÎÄÓïÑÔÄ£×ӵĻù×¼£¬Í¨¹ýÌìÉú²î±ðÐòÁг¤¶ÈµÄºÏ³ÉÑù±¾£¬ÁýÕÖËĸöʹÃüÖֱ𡣠LV-Eval£ºÕâÊÇÒ»¸ö³¤ÉÏÏÂÎÄ»ù×¼£¬°üÀ¨×î¸ß256K tokenµÄÎå¸ö³¤¶È¼¶±ð£¬Öصã²âÊÔÁ½ÖÖʹÃü£ºµ¥ÌøÎÊ´ð£¨single-hop QA£©ºÍ¶àÌøÎÊ´ð£¨multi-hop QA£©¡£ InfiniteBench£ºÕâÊÇÒ»¸ö³¤ÉÏÏÂÎÄ»ù×¼£¬Æ½¾ùÊäÈ볤¶ÈÔ¼200K token£¬×î´ó³¤¶ÈÁè¼Ý2M token£¬°üÀ¨ºÏ³ÉʹÃüºÍÏÖʵÌìÏÂʹÃü¡£ ʵÑéЧ¹û Ñо¿Ö°Ô±Ê×ÏÈ´Ó¡¸´óº£ÀÌÕ롹£¨Needle in a Haystack£¬NIAH£©ÕâÒ»²âÊÔ×îÏÈ£¬È»ºóÔÙ̽ÌÖ³¤ÉÏÏÂÎĺͱê×¼»ù×¼µÄÆÀ¹À¡£ Ñо¿Ö°Ô±Í¨¹ýNIAHÃÜÂë¼ìË÷²âÊÔ£¬ÆÀ¹ÀÄ£×ÓÔÚ³¤ÉÏÏÂÎļìË÷·½ÃæµÄÄÜÁ¦¡£ÔÚÕâ¸öʹÃüÖУ¬Ä£×ÓÐèÒªÔÚÒ»´ó¶ÎºÁÎÞÒâÒåµÄÎı¾ÖУ¬ÕÒµ½Ò»¸ö¼òÆÓÃÜÂ룬ºÃ±ÈÒ»¸öËæ»úµÄÁùλÊý×Ö¡£ ΪÁËÁ¿»¯¼ìË÷µÄ׼ȷÐÔ£¬ËûÃDzâÊÔÁË40ÖÖ²î±ðµÄÊäÈëÐòÁг¤¶È¡£¹ØÓÚÿÖÖ³¤¶È£¬ÃÜÂë»á±»Ëæ»ú²åÈëµ½10¸öÔȳÆÂþÑܵÄÎĵµÉî¶ÈÖС£ Ч¹ûÈçͼ2Ëùʾ¡£¹ØÓÚ±¾ÎĵÄÄ£×Ó£¬²âÊÔÁ˸ߴï100Íò¡¢200ÍòºÍ400Íò¸ötokenµÄÊäÈ볤¶È£»¶ø¹ØÓÚ»ù׼ģ×Ó£¬Ö»²âÊÔÁË×î¸ß100Íò¸ötoken¡£ Èçͼ2aµ½2cËùʾ£¬ÔÚ»ù׼ģ×ÓÖУ¬Ö»ÓÐLlama-3-8B-InstructºÍGradient-1048kͨ¹ýÁËNIAH²âÊÔ£¬¶øLlama-3.1-8B-InstructºÍLlama-3-8B-ProLong-512k-Instruct×ÝÈ»ÔÚËüÃÇÉù³ÆµÄÉÏÏÂÎij¤¶ÈÄÚÒ²·ºÆðÁ˹ýʧ¡£ Ïà±È֮ϣ¬Èçͼ2dµ½2fËùʾ£¬Ñо¿Õߵij¬³¤£¨UltraLong£©Ä£×ÓÔÚËùÓÐÊäÈ볤¶ÈºÍÉî¶ÈÉ϶¼µÖ´ïÁË100%µÄ׼ȷÂÊ£¬Õ¹ÏÖÁËǿʢµÄ³¤ÉÏÏÂÎļìË÷ÄÜÁ¦¡£ Ñо¿ÕßÔÚRULER¡¢LV-EvalºÍInfiniteBenchÉÏµÄÆÀ¹ÀЧ¹ûÈç±í1Ëùʾ¡£¼Ó´ÖµÄÊý×ÖÌåÏÖÐÔÄÜÁè¼ÝÁËËùÓлù׼ģ×Ó¡£ ×ÜÌåÀ´Ëµ£¬ËûÃǵÄÈý¸öÄ£×ÓÔÚ´ó´ó¶¼ÇéÐÎ϶¼È¡µÃÁË×î¸ß·Ö¡£ ÔÚRULER»ù×¼²âÊÔÖУ¬UltraLongÄ£×ÓÔÚ512KºÍ100Íò¸ötokenµÄÊäÈ볤¶ÈÉÏÌåÏÖ×î¼Ñ¡£ÔÚLV-EvalÖУ¬ËûÃǵÄÄ£×ÓÔÚ128KºÍ256K token³¤¶ÈÄ򵀮½¾ùF1·ÖÊý×î¸ß¡£ ±ðµÄ£¬ËûÃÇÔÚInfiniteBenchÉÏҲȡµÃÁË×î¼ÑÌåÏÖ¡£ ÕâЩЧ¹ûÅú×¢£¬Ñо¿ÕßµÄѵÁ·ÒªÁìÓÐÓÃÀ©Õ¹ÁËÓïÑÔÄ£×ÓµÄÉÏÏÂÎÄ´°¿Úµ½³¬³¤ÊäÈ룬ͬʱ¼á³ÖÁËÔÓÐÊäÈ볤¶ÈµÄÐÔÄÜ¡£ Ïà±È֮ϣ¬»ù׼ģ×ÓÖУ¬Llama-3.1ÊÇΪ128KÊäÈ볤¶ÈÉè¼ÆµÄ£¬µ±ÊäÈëÁè¼Ý128K tokenʱ£¬ÐÔÄÜÏÔÖøÏ½µ¡£ProLongÊÇΪ512KÉÏÏÂÎÄÉè¼ÆµÄ£¬µ«×ÝÈ»ËüѵÁ·Á˸ü¶àtoken£¨410ÒÚ±ÈÕÕ10ÒÚ£©£¬ÔÚ512K³¤¶ÈÉϵÄÌåÏÖÒ²²»ÈçËûÃǵÄÄ£×Ó¡£ GradientÊÇ»ù׼ģ×ÓÖÐÖ§³Ö×ÉÏÏÂÎĵģ¨100Íò¸ötoken£©£¬µ«ÔÚLV-EvalºÍInfiniteBenchÉϵÄÌåÏֽϲ˵Ã÷ËüµÄÉè¼Æ¿ÉÄܹýÓÚÆ«ÏòÈ˹¤Ê¹Ãü£¬ÎþÉüÁËÏÖʵʹÃüµÄЧ¹û¡£ ¶ø±¾ÎĵÄÄ£×ÓÔÚÈ˹¤£¨RULER£©ºÍ»ìÏý£¨LV-EvalºÍInfiniteBench£©»ù×¼²âÊÔÖÐʼÖÕ¼á³Ö¸ü¸ßµÄ·ÖÊý£¬Í¹ÏÔÁËÒªÁìµÄ¸ßЧÐԺͿÉÀ©Õ¹ÐÔ¡£ Ñо¿Õß»¹Í¨¹ýͨÓá¢ÊýѧºÍ´úÂëÁìÓòµÄ±ê×¼»ù×¼²âÊÔÆÀ¹ÀÁËÄ£×Ó£¬ÒÔÈ·±£À©Õ¹ÉÏÏÂÎij¤¶È²»»áÓ°Ïì¶ÌÉÏÏÂÎÄʹÃüµÄÐÔÄÜ¡£ Èç±í2Ëùʾ£¬ËûÃǵÄÄ£×ÓÐÔÄÜÓë»ù´¡Ä£×ÓLlama-3.1-8B-InstructÏ൱ÉõÖÁ¸ü¸ß£¬Æ½¾ù·ÖÊý»®·ÖΪ62.47¡¢61.06ºÍ60.95£¬¶øLlama-3.1-8B-InstructΪ61.45¡£ ÌØÊâÖµµÃÒ»ÌáµÄÊÇ£¬ËûÃǵÄÄ£×ÓÔÚMMLUºÍMATH»ù×¼ÉÏÌåÏÖ³öÏÔ×ÅÌáÉý£¬Í¬Ê±ÔÚGSM8KºÍHumanEvalµÈÆäËû»ù×¼ÉϵÄÌåÏÖÒ²¼«¾ß¾ºÕùÁ¦¡£ Ïà±È֮ϣ¬»ù×¼³¤ÉÏÏÂÎÄÄ£×ÓGradientºÍProLongÔÚÕâЩ±ê׼ʹÃüÉϵÄÐÔÄÜ´ó·ùϽµ£¬Æ½¾ù·ÖÊý½öΪ37.36ºÍ40.81¡£ ÕâЩЧ¹ûÅú×¢£¬Ñо¿ÕßµÄÒªÁì²»µ«ÓÐÓÃÀ©Õ¹ÁËÉÏÏÂÎÄ´°¿Ú£¬»¹¼á³ÖÉõÖÁÌáÉýÁËÄ£×ÓµÄͨÓÃʹÃüÄÜÁ¦¡£ ¶øLlama-3-8B-Instruct-Gradient-1048kºÍLlama-3-8B-ProLong-512k-InstructµÄÏÔÖøÐÔÄÜϽµ£¬Åú×¢ËüÃǵij¬³¤ÉÏÏÂÎÄÒªÁì¿ÉÄܱ£´æ¾ÖÏÞÐÔ¡£ ½áÂÛ ÔÚÕâÏîÊÂÇéÖУ¬Ñо¿Ö°Ô±Ìá³öÁËÒ»ÖÖ¸ßЧÇÒϵͳ»¯µÄѵÁ·ÒªÁ죬ÓÃÓÚ³¬³¤ÉÏÏÂÎÄÓïÑÔÄ£×Ó£¬½«ÉÏÏÂÎÄ´°¿ÚÀ©Õ¹µ½100Íò¡¢200ÍòºÍ400Íò¸ötoken£¬Í¬Ê±ÔÚ±ê×¼»ù×¼²âÊÔÖмá³ÖÁ˾ºÕùÁ¦¡£ ÕâÖÖÁ¬ÏµÁ˸ßЧµÄÒ»Á¬Ô¤ÑµÁ·ºÍÖ¸Áî΢µ÷£¬²»µ«ÌáÉýÁËÄ£×Ó¶Ô³¤ÉÏÏÂÎĵÄÃ÷È·ÄÜÁ¦£¬»¹ÔöÇ¿ÁËÆä×ñÕÕÖ¸ÁîµÄÄÜÁ¦¡£ ÕâÒ»¿ò¼ÜΪ¿ÉÀ©Õ¹µÄ³¤ÉÏÏÂÎĽ¨Ä£Ê÷Á¢ÁËбê¸Ë£¬Ò²ÎªÎ´À´ÔÚÏÖʵӦÓÃÖÐÌáÉý³¤ÉÏÏÂÎÄÐÔÄܵÄÑо¿ÆÌƽÁËõè¾¶¡£ ×÷ÕßÏÈÈÝ Chejian Xu ÒÁÀûŵÒÁ´óѧ¶ò°ÍÄÉ-ÏãéÄ·ÖУ£¨UIUC£©ÅÌËã»ú¿ÆÑ§²©Ê¿Ñо¿Éú£¬µ¼Ê¦ÊÇBo Li½ÌÊÚ¡£Õã½´óѧÅÌËã»ú¿ÆÑ§ÓëÊÖÒÕרҵѧʿѧ룬¾Í¶ÁÓÚCKCÉùÓþѧԺ£¬µ¼Ê¦ÊÇShouling Ji½ÌÊÚºÍSiliang Tang½ÌÊÚ¡£ רעÓÚÌáÉý»ù´¡Ä£×ÓµÄÇå¾²ÐÔ¡¢¿É¿¿ÐÔºÍÒ»ÖÂÐÔ£¬°üÀ¨LLMs¡¢¶àģ̬ģ×ÓÒÔ¼°»ùÓÚLLMµÄÖÇÄÜÌå¡£ Wei Ping NVIDIAÓ¦ÓÃÉî¶ÈѧϰÑо¿ÍŶӵÄ×ÊÉîÑо¿¿ÆÑ§¼Ò£¬×¨×¢ÓÚ´óÐÍÓïÑÔÄ£×ÓºÍÌìÉúÄ£×ÓµÄÑо¿¡£ ¼ÓÖÝ´óѧŷÎÄ·ÖУ»úеѧϰ²©Ê¿Ñ§Î»£¬ÈÈÖÔÓÚ¹¹½¨ÓÃÓÚÎı¾¡¢ÒôƵºÍ¶àģ̬Êý¾ÝµÄ¼â¶ËÌìÉúÄ£×Ó¡£´Ëǰ£¬Ôøµ£µ±°Ù¶È¹è¹ÈÈ˹¤ÖÇÄÜʵÑéÊÒ£¨ÓÉÎâ¶÷´ï½¨É裩µÄÎı¾µ½ÓïÒôÍŶÓÈÏÕæÈË¡£ ²Î¿¼×ÊÁÏ£º https://arxiv.org/abs/2504.06214